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REMARKS 

Claims 1-24 are pending in this application. Claims 1-3 have been amended. Claims 4-21 
and 23-24 have been withdrawn as the result of an earUer restriction requirement. Claim 22 has 
been cancelled. New claims 25-36 have been added. The specification has been amended to 
correct various typographical errors. The amendments and new claims do not add new matter. In 
view of the Office's earlier restriction requirement, Applicants retain the right to present claims 
4-21 & 23-24 in a divisional application. 
Amendments to the specification 

Paragraph 0001 of the application is amended to clarify the status of this and related 
applications. This correction is a result of Applicant's cancellation of claim 22, which is claimed 
in the earlier application U.S. Ser, No. 10/436,376, filed May 12, 2003. 

Also, various passages and two tables in the specification have been amended to correct 
typographical errors. All text and tables deleted in the current amendments correct errors that 
would be immediately obvious from reading the specification and Fig. 1. Specifically, the 
deleted text refers to molecular marker Satt228, which is located on Molecular Linkage Group 
(MLG) A2. The noted marker and MLG are incorrectly identified due to a typographical error. 
The specification and Fig. 1 clearly disclose the correct molecular markers that are associated 
with Rps8, (i.e. Satt 595, Satt516, Sattll4, Satt334, Sat-317, Satt335, Satt510, Sattl44, and Sat- 
197), all of which are on MLG F, See specification at Fig. 1, 110007, 0039, 0063, 0083 - 0085, 
Tables 6 & 7, and Examples 4 & 5. 

The reason for the typographical errors is that the inventors' first attempt at localizing the 
novel Rps8 locus resulted in aberrant resuhs that were described in two U.S. Provisional patent 
applications {see Bumham et al (2003), U.S. Provisional Apphcation No. 60/379,304, filed May 
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10, 2002; U.S. Provisional Application No. 60/427,637, filed November 19, 2002; and US Ser. 
No. 10/436,376, Filed: May 12, 2003). Upon discovering the errors reported in those 
applications, Applicants filed a new application and disclosed the correct methodology and 
molecular markers. The correct method and markers are the subject of the instant patent 
application; however, a few passages in the instant application were inadvertently duplicated 
fi-om the earlier applications, thus resulting in inclusion of obviously incorrect information in the 
instant specification. Applicants hereby amend the specification to correct the inadvertent and 
erroneous information, specifically, references to Satt228 and MLG A2. Thus, the amendments 
do not add any new matter and are fully supported by the bulk of the specification, Fig. 1 and the 
working examples as stated above. 
Claim rejection - 35 USC §112, second paragraph 

The Office has rejected claim 3 as being indefinite. Claim 3 has been amended to more 
particularly point out and distinctly claim the subject matter which Apphcants regard as the 
invention. Withdrawal of the rejection is respectfully requested. 
Claim rejection - 35 USC §112, Enablement 

Claims 1-3 have been rejected as failing to comply with the enablement requirement 
because, in the Office's view, "[t]he assignment of molecular markers to particular traits is 
unpredictable and population-specific." The OfiBce has cited several references to support this 
statement. 

The test of enablement is whether one reasonably skilled in the art could, without undue 
experimentation, make or use the invention fi-om the disclosures in the patent coupled with 
information Icnown in the art. (MPEP §2164.01). The factors to consider when determining 
whether a disclosure satisfies the enablement requirement and whether any necessary 
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experimentation is "undue" include: the amount of direction provided by the inventor; the 
existence of working examples; the level of one of ordinary skill; the state of the prior art; the 
breadth of the claims; the level of predictability in the art; the nature of the invention; and the 
quantity of experimentation needed to make or use the invention based on the content of the 
disclosure. (MPEP §2164.01(a)(citing/« re Wands, 858 F.2d 731, 737, 8 USPQ2d 1400, 1404 
(Fed. Cir. 1988)). It is improper to conclude that a disclosure is not enabling based on an 
analysis of only one of the above factors while ignoring one or more of the others. (MPEP id) 
Applicants' analysis of the instant claims according to the Wands factors follows: 

Like the inventors in In re Wands, the disclosure provides considerable direction and 
guidance on how to practice the invention as claimed in claims 1-3. The disclosure provides 
worlcing examples. The level of skill in the art is high, requiring a practitioner to use molecular 
biology techniques, and the state of tlie prior art was such that all of the methods as well as the 
molecular markers needed to practice the invention were well known at the time the appUcation 
was filed. (Information relating to the sequence of PGR primers to the 600+ SSR loci reported in 
Cregan et al. (1999) and a standard protocol for their amplification can be obtained on the 
USDA-ARS Soybean Genome Database, Soybase, at http://sovbase.org/ (set-up in 1998, verified 
June 29, 2006.) 

The claimed invention generally involves determining the presence or absence of 
Phytophthora sojae resistance in a soybean as indicated by the presence or absence of a newly- 
discovered resistance locus (RpsB), which maps to linkage group MLG F. (Specification at 
f 0007.) According to the method, genomic DNA fi-om a soybean is analyzed for the presence of 
the RpsB locus. {Id.) The presence of the Rps8 locus is determined through the use of one or 
more molecular markers linked to Rps8. {Id) This method is generally known as marker 
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assisted selection (MAS). (^0028) Applicants provide the declaration of Dr. Anne Dorrance 
(Exhibit A) for an explanation of MAS. 

Referring to the specification, Applicants first identified the presence of a new P. sojae 
resistance gene, Rps8, in the plant introduction PI 399073. (Specification at p038.) Then, using 
routine and well known crossing and breeding methods, Applicants crossed the soybean plant 
caiTying the Rps8 trait locus with a soybean plant having other specific desirable traits to 
produce a soybean line possessing the combination of desired phenotypes. (Specification at f 
0046, see generally Iff 0043-0046.) Next, several crosses were created (HFXOl-602, OX-98317, 
OX-99218, and OX-99128 disclosed in T^f 0010-13 respectively) to produce progeny containing 
the Rps8 trait locus, as determined by their resistance to particular P. sojae pathotypes. (see 
specification at f0067 and Table 1, showing the resistance of OX-99218 to P. sojae pathotypes 
OH30 and 0H4; 1|0071 and Table 3, showing the resistance of OX-99128 to P, sojae pathotypes 
OH30 and 0H4; P079 and Table 5, showing the resistance of the cross Williams x PI399073 to 
P. sojae pathotypes OHl and OH25.). The seeds of one of the resultant germplasms, designated 
HFXO 1-602 was deposited with the ATCC in accordance with the Budapest Treaty. 
(Specification at T[0074.) 

Having determined the presence of Rps8 gene in a plant by phenotypic analysis, the 
genome of that plant was analyzed for the presence of markers associated with Rps8. (see |0081, 
explaining that the cross used for analyzing SSR marker association was Williams x PI399073; 
Example 2, generally explaining how the SSR markers linked to Rps8 were identified; and 
Example 4, the use of Joimnap and well known statistical analysis to determine which marker 
associations were statistically significant.) Applicants also confirmed their results using another 
cross. (Specification at in| 0089-93, Example 5.) 
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With respect to the MAS methodology, both the specification and pubUcations in the art 
explain that the predictability of MAS increases if the number of markers is increased, and the 
markers bracket or flank the trait locus, (see specification at f 0055; see also Demirbas et al 
(2001), p. 220, 2nd col., 2nd & 3rd \% ) While claim 1 recites the use of at least two markers, 
there are several molecular markers disclosed in the specification, some of which flank the trait 
locus, and the number of markers known and available to those skilled in the art to be associated 
with the region of interest. 

As Dr. Dorrance explains in her declaration (Declaration of Anne Dorrance, Page 2, 
Paragraphs 5 and 6), the inventors' own method, the extent of experience reported in the art with 
MAS, and the extent of information about molecular markers that was readily available to those 
skilled in the art as of the date the application was filed, was more than sufficient to enable a 
skilled artisan to successfully perform MAS for purposes of identifying soybean plants having 
the Rps8 locus. 

Accordingly, Applicants maintain that the enablement requirement has been satisfied in 
view of the state of the art, together with the extent of disclosure in the specification regarding 
the Rps8 locus and various markers therefore. 

As regards the scope of the claims, claim 1 recites soybean as the only genus in which 
the invention is practiced. (See specification at f 0046, explaining that the invention may be 
applied generally to any plant variety of the genus Glycine, or soybean.) For this reason, the 
Office's reliance on Westman et al. (1997) ("Westman") is in error. Westman used molecular 
markers developed in the genus Arabidopsis to amplify marker loci in six Brassica crop species. 
(Westman, abstract, lines 5-8.) Westman, therefore, was evaluating whether markers developed 
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in one species could work "across taxa" in another species. This is not the method of the instant 
application, which is limited to molecular markers in only one species: soybean. 

As regards predictability of the art, several factors negate the Office's conclusions 
about the unpredictability of MAS across different populations of soybean. First, the Office's 
reliance on Michelmore et al. (1991), van Ooijen et al. (1994) and Concibido et al. (1997) for the 
conclusion that these references "teach that it is unpredictable whether any particular PCR- 
derived or RFLP molecular marker developed with one population of soybeans may be 
successfully utihzed with another population comprising the same species, or with interspecies 
hybrids" is misplaced. Michelmore, Concibido and Van Ooijen (as well as Lee et al. (1996)) only 
discuss non-SSR markers such as restriction fragment length polymorphism (RFLP) and random 
amplified polymorphic DNA primers (RAPDs). Not only are these references irrelevant as 
regards to claim 2 and 3, which recite SSR markers, but the teaching in these references cannot 
be extended to include "any particular PGR derived" molecular markers because these references 
specifically exclude SSR markers. 

SSR markers are very different from RFLPs. This is because, as Cregan explains, only 
rarely have more than two alleles been identified as RFLP loci in soybean. (Cregan, p:1464, col. 
2, second 1.) Thus, because these two alleles generally have asymmetric firequencies, the 
Ukelihood that any two genotypes will be polymorphic at a particular RFLP locus is relatively 
low. (Jd.) For this reason, using RFLPs alone, a polymorphic fragment mapped in one population 
may not be segregating in another. {Id.) A second drawback of using only RFLPs is the detection 
of multiple DNA fragments (i.e. multiple loci) with most probes. {Id.) The multiplicity of RFLP 
loci can make RFLP linkage maps ambiguous with respect to RFLP locus identity, and often 
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precludes the use of such loci for the evaluation of linkage group homology among different 
maps. {Id at p: 1465, \si\) 

In sharp contrast to RFLP markers, SSR markers are extremely useful because "the high 
levels of polymorphism, co-dominant inheritance, and the locus specificity of SSR markers in 
soybean" together with their "random distribution in the genome" make SSR markers "an 
excellent complement to RFLP markers for use in soybean molecular biology genetics and plant- 
breeding research." (Shoemaker et al. (1994)(Exhibit B), p. 241, 2nd % lines 6-7 and 15-18. See 
also Song et al. (2004)(Exhibit C), p. 123, 1st Col., 1st \ lines 7-11: "Most SSRs are single- 
locus markers, and many SSR loci are multi-allelic. These characteristics make SSRs an ideal 
marker system not only for creating genetic maps, but also as an unambiguous means of defining 
linkage group homology across mapping populations.") Applicants refer to the declaration of Dr. 
Dorrance for an explanation of the fact that SSRs map consistently to the same genomic region 
across different soybean populations. (Declaration of Anne Dorrance, pages 2-3, Paragraphs 7 
and 8) 

With regard to the Lee and Concibido references cited by the Office, these references are 
simply inapplicable to the claimed invention because both Lee and Concibido used molecular 
markers to identify quantitative trait loci (QTLs) ~ they were not in any way concerned with any 
single gene trait locus, such as Rps8. QTL's are not analogous to the Rps8 single gene locus of 
the present appUcation because many genes affect QTLs (see, e.g. Lee at al, page 517, col. 1, 
second paragraph). In contrast, Rps8 is a single dominant gene, which segregates according to 
Mendelian genetic principles (Specification at TI0046, 1st sentence; and f 0078-79, disclosing a 
3:1 resistant to susceptible ratio for the different families resulting fi-om the Williams X 
PI399073, which is indicative that Rps8 segregated as a single dominant gene). This means that 
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1 in 4 progeny of a cross between a parent containing Rps8 and a parent that does not have Rps8 
will have Rps8. Thus, there is a much higher probability (i.e. predictability) of successful MAS 
for a single gene locus as compared to a QTL. The context of the Lee and Concibido references 
are not analogous to that of the instant case, and one of ordinary skill in the art would find no 
motivation to use the methods described in these references for purposes of evaluating a single 
dominate gene locus. 

Finally, similar to the analysis in Wands, the nature of the technology is such that it 
involves screening plants to determine which ones carry the desired Rps8 trait locus. Therefore, 
practitioners of this art are prepared to screen negative plants in order to find one that carries the 
desired trait. (See In re Wands, 8 USPQ2d 1400, 1406 (explaining that "the nature of monoclonal 
antibody technology is that it involves screening hybridomas to detemine which ones secrete 
antibody with desired characteristics" and so "practitioners of this art are prepared to screen 
negative hybridomas in order to find one that makes the desired antibody" and that such 
screening did not constitute "undue experimentation.") Claim 1 has been amended for clarity to 
recite that detecting the presence of the molecular markers provides an indication that the trait 
locus RpsS is present in the soybean. (See specification at Summary, H 0007, first sentence.) It is 
well known that MAS is an "indirect" method of selection based on the probability that if a plant 
carries a marker associated with a particular trait locus, there is a probability that the plant carries 
the trait locus. With a higher probability of successful selection, based on a high level of 
statistical significance, there is a higher predictability that the methods will be useful. 

"The presence of inoperative embodiments within the scope of a claim does not 
necessarily render a claim nonenabled." (MPEP §2 164.08(b).) Claims reading on significant 
numbers of inoperative embodiments would render claims nonenabled only when the 
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specification does not clearly identify the operative embodiments and undue experimentation is 
involved in determining those that are operative. (Id,) The instant application clearly identifies 
the operative embodiments and, given the amount of guidance in the specification, the amount of 
experimentation required in determining the operative embodiments is not undue. This is 
because those skilled in the art tailor the molecular markers to the particular cross (or population) 
that they are interested in, thereby increasing the predictability of successful marker assisted 
selection. Several maps of related species, showing how regions can be reorganized, are known 
in the art and are readily available, for example on the soybase database referenced above, so that 
skilled artisans would be able to tailor markers to the population of interest. The test for undue 
experimentation is "not merely quantitative," and "a considerable amount of experimentation is 
permissible, if it is merely routine , or if the specification in question provides a reasonable 
amount of guidance with respect to the direction in which the experimentation should proceed." 
(Ex parte JacJcson, 217 USPQ 804, 807 (1982)). 

In view of the Wands factor analysis above. Applicants submit that the instant 
specification provides the requisite "reasonable amount of guidance" by specifically disclosing 
the plant phenotypes that have the desired and novel RpsB-associated P. sojae resistance trait, 
and by mapping the Rps8 locus to MLG F, to enable a skilled artisan to determine whether a 
soybean plant has Rps8 trait locus. Moreover, by providing SSR molecular markers that are 
associated with the Rps8 locus, two of which flanlc the locus, the predictability of the MAS is 
sufficiently increased to enable a person skilled in the art to make and/or use the invention as 
claimed. For the foregoing reasons, withdrawal of the enablement rejection is respectfully 
requested. 
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Claim rejection - 35 USC §102, Anticipation 

Claims 1-3 have been rejected as being anticipated by Demirbas et al. (2001) 
("Demirbas"). 

Amended claim 1 recites a method for deteraiining the presence of trait locus Rps8 in a 
soybean by analyzing the genomic DNA of the soybean for the presence of at least two 
molecular markers associated with trait locus Rps8. Although Demirbas discloses Sattl 14, 
which Applicants discovered to be associated with Rps8, Demirbas, which is concerned with 
Rps3, does not disclose any other markers associated with RpsB. Since claim 1 recites at least 
two molecular markers associated with RpsB, Demirbas does not anticipate claim 1. 

Claims 2 and 3 depend from claim 1 and so are novel for at least the same reasons as 
claim L Withdrawal of the rejection is respectfully requested. 
Claim rejection - 35 USC §103, Obviousness 

Claim 3 is rejected as obvious in view of Demirbas and Cregan et al. (1999). Applicants 
respectfully maintain that the Office has failed to make a prima facie case of obviousness. 

According to the Office, a skilled artisan would have been motivated to combine the 
Sattl 14 marker, disclosed in Demirbas to be associated with Rps3, and Satt516 marker, disclosed 
in Cregan to be located on linkage group F, to arrive at claim 3. This, however, is a clear 
example of "improper hindsight" reconstruction of the invention. (MPEP §2142.) According to 
established law, although every element of a claimed invention may often be found in the prior 
art, identification in the prior art of each individual part claimed is insufficient to defeat 
patentability of the whole claimed invention. (In re Kotzab, 217 F.3d 1365, 1370 (Fed. Cir. 
2000)) To establish obviousness based on a combination of the elements disclosed in the prior 
art, there must be some suggestion or motivation, either in the references themselves or in the 
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knowledge generally available to one of ordinary skill in the art, to modify the reference or to 
combine reference teachings. (MPEP §2143.01 L) 

There is absolutely no teaching, motivation, or suggestion either in Cregan or in 
Demirbas, to combine Sattl 14 and Satt516 to find a new P. sojae resistance locus on linkage 
group F. This is because first, a skilled artisan would have had to find a plant that possessed the 
required phenotype (resistance to P. sojae pathotypes virla, lb, Ic, Id, Ik, 2, 3a, 3b, 3c, 4, 5, 6 
and 7) before finding the novel gene which conferred the particular resistance to P, sojae. Such 
a plant was not disclosed in either Demirbas or Cregan, and was not within the loiowledge of one 
of ordinary skill in the art because it was not known prior to the Applicants' disclosure thereof. 
All soybeans would probably have a particular marker, such as Sattl 14, in their DNA; it is the 
association of that marker with the new phenotype fi-om the source plant that is novel and non- 
obvious. 

Additionally, Demirbas established that other P. sojae resistance loci Rpsl - Rps6 are 
located on linkage groups N, J, F, and G. Thus, absent considerable experimentation, a skilled 
artisan in possession of the disclosure by Demirbas would not have loiown which linkage group 
a new Rps locus would map to, and would not have been motivated to choose a molecular 
marker associated with Unkage group F to combine with any other molecular marker. 

Lastiy, although Demirbas et al disclosed Sattl 14 to be "moderately linked" to Rps3, the 
authors concluded that "Neither Sattl 14 nor Satt374 displayed any significant linlcage to Rps3" 
and excluded Sattl 14 as a usefiil marker for efficient marker assisted selection for Rps3 (see 
Discussion, page 1226, 1st col., 2nd % lines 2-7.) Thus, even if the location of the new Rps8 
gene locus was found to be related to the location of Rps3, such a finding was not reported by 
Demirbas or in any other art as of the filing date of the instant case, and importantly, Demirbas 
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teaches away from using Sattl 14 for marker assisted selection of Rps3 and provides no teaching 
to use that marker for any other trait, much less Rps8. 

Upon reading Demirbas and Cregan, one of ordinary skill would find no motivation to 
combine the teachings for the reasons given above. And even if the teachings of these references 
were combined, the combination would not provide the methods of the instant claims because, as 
noted above, the discovery of the novel Rps8 trait locus and the associated P. sojae resistance 
had not yet been made. For these reasons, Applicants maintain that claim 3 is not obvious. 
Withdrawal of the rejection is respectfully requested. 

New claims 25 to 36 have been added. Support for claim 25 is found in the specification 
at K 0066, p070, and Example 1. Claim 26 is supported by the specification at ^0038, and 
TI^I0045-46. Support for claims 27 to 34 is found in the specification at ^[0007 ("The presence of 
the Rps8 gene is determined through the use of one or more molecular markers linked to Rps8"); 
Tf0028; fl 0050-0055; and ^056-0061. Support for claims 35 and 36 is found in the 
specification at 110067, 110071, 110079 and Tables 1, 3 & 5. The new claims do not add new 
matter. 

It is respectfully submitted that the application is now in condition for allowance. 
Applicants respectfully request that a timely Notice of Allowance be issued in this case. 

Respectfully submitted, 

Date: September 5, 2006 By: /diane h. dobrea/ 

Diane H. Dobrea 
Reg. No. 48,578 
(614) 621-7788 
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IN rm UNITED STATES PATENT AND TRADEMARK OFFICE 



In re Application ofi 

St. Martin etal. 

Application No.: 10/778,018 

Filed: February 12, 2004 

For: Identification of Soybeans Having 
Resistance to Phytophthora Sojae 



Group Art Unit; 1638 
ConfirraationNo,; 3349 
Examiner: Keith O'Neal Robinson 
Attorney Docket No.: 22727/04212 



Commissioner for Patents 
P. 0. Box 1450 
Alexandria, VA 22313 

Declaration of Dr. ANNE DORRANCE under 37 C.F.R § X.132 

I, Anne Dorrance, an inventor in the above-identified application, declare as follows: 

1. I received a Ph.D. from Virginia Polytechnic Institute and State University and 
postdoctoral training at Washington State University. Since then, I have been employed 
as fecully in the Department of Plant Pathology at The Ohio State University. I have 
published 24 (3 more are in press) MWenglh publications in peer-reviewed international 
scientific journals. 

2. Currently, I am m associate Professor of Plant Pathology at The Ohio State University. 

3. I am a co-inventor of the above application, and have directed research relating to 
soybean plants for nine years. I have published 12 (3 more are in press) peer-reviewed 
papers on resistance to Phytophthora sojm in scientific journals. 
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4. I would like to comment on the process of marker assisted selection or MAS to clarify 
(a) our experiments, which form the basis of the patent application; and (b) how other 
scientists can successfully perform MAS based on the information in our application. 

5. The steps that are used in MAS may be sununarized as follows; (!) identifying the 
locus/gene of interest, i.e. Rps8, which confers the desired trait, i.e. a new, Rps-8 
derived resistance to P. sojae pathotypes that will normally kill plants, including 
plants that have one or more of the previously identified Rps genes (Rpsla, Rpslb, 
Rpslc, Rpsld, Rpslk, Rps2, RpsSa, Rps3b, Rps3c, Rps4, RpsS. Rps6 or Rps7); (ii) 
cross breeding plknts that have the Rps8 locus with plants that do not have the RpsS 
locus to develop progeny segregating for the trait; (iii) identiiying molecular markers 
that are genetically linked to the Rps8 locus/gene and mapping the Rps8 locus. MAS 
can tlien be performed on progeny developed fi-om any cross with a parent that has 
the Rps8 gene to determine which progeny carry the Rps8 gene. Tlie parent with the 
RpsS gene is chosen based its phenotypic characteristics. MAS is performed using 
molecular markers from the region that was identified in step (iii). 

6. As explained in the specification, we carried out steps (i), (ii) and (lii) and mapped the 
RpsS locus to a particular region on major linkage group (MLG) F. We also provide 
nine SSR molecular markers, two of which flank Uie Rps8 locus, which £ire genetically 
linked to the RpsS. SSR markers consistently map to the same region of the Glycine 
(soybean) genome. This means that in all soybean populations tested, SSR markers map 
to a single locus In the genome with a map order that is essentially identical in aU 
populations. (Shoemaker et al. 2004 (Exhibit A), p.243, 2nd I lines 13-16.) This has 
been demonstrated across a number of soybean populations and is llie basis of die 
"consensus" map of the soybean genome (Cregan et al., 1999). hi other words, as a 
result of the SSR cross-population consistency of mapping, Cregan, in 1999, was able to 
align the 20+ linkage groups derived from each of three soybean populations into a 
consensos set of homologous groups to correspond to the 20 pairs of soybean 
chromosome. Thus, the consensus map of the soybean genome was compiled 
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Declaration o/Anm Dormnce 
US Application No, 10/778,018 
Attorney Docket No. 22727/04212 

combming SSR data from several Glycine populations. All of this infotmation (the SSR 
markers, the sequence of the PCR primers to the 600+ SSR loci reported in Cregan, a 
standard amplification protocol, and the consensus map) is available to tiie soybean 
oommimity at http://soybase.OTg/. 

7, Given the consistency of SSR data across soybean populations together with the 
information provided in the instant application, scientists in the field would be able to 
choose appropriate molecular markers for m cross that displays the RpsS-associated 
resistance phenotype. Moreover, m molecular marker that maps to the region on MLG 
F identified in our application as encompassing Satt5l6 to Satill4, and that is 
polymorphic for the parents in that region, can be used for successful MAS. 

8. I hereby declare that all statements made herein of my own knowledge are true and that 
all statements made on information and beUef are believed to be true; and further that 
these statements were made with the knowledge that willful false statements and the like 
so made are punishable by fine or imprisonment, or both, under Section 1001 of Title 1 8 
of the United States Code, and that such wlllftil false statenaents may jeopardize the 
validity of the application or any patent issued thereon. 



Jr. Anne Dorrance 
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RANDY C. SHOEMAKER 

Iowa State Univemlty 
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PliRUYB-CREGAN 
BeltsviUeAgriculturaiiesearcH Center 
BehsvUle, Maiyland 
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6-1 THE SOYBEAN GENOME 



Cftvhpnn WlvciM mox (h.) MeiT.l has emerged as n model crop system he- 
cause o?SnS«^^^ l999),awell-d.vdoped 

a\ 1999). and the growing number of gencUc tools applicable to th 8 biological sya* 

Sm 3 wed in ^ 1999). tt is also "r^'fooT.'l'lfntf 

and a muUibilKon-dollorcrop of the USA (Riley. 1999; SoyStms, 1997 . 
Tte soybean genome is of overage size compared to m of many oj^^ 
Dlams Itis comprised of abomiaMbp/C(Ammaganathan and Earlej99l) Th^^^ 
makesltStaevenandone.halftimes]argerdwnthegenomeofArobidopsl^^ 

ote^S mes la.^er than rice (Ory^ miva U). Still the soybean genome 
Ss than half the si/-e of the com {Zea mays L.) genome and more than 14 times 
S tL the genome of broad wheat (7Vi torn a.stim. L.) ( Aramueanath n 
ndtieJ99l)\pproximately40to6^^^^^^ 

he defined as repetitive (Gurley et al„ 1 979; Goldberg, 197»). One family ot repel- 
ftWrSSrcrSTOm^^ is comprised of an upproKimate 120 bp monotvier (Jlor- 

Stive equencefamiUes may be speeies-spccmc ^^"'t » I'Siker 
Bacteria] artificial chromosome (BAC) libmnes (Marek and Shoemaker. 
1997 DaSrefal,. 1998; Tomkins « «!.. J.999; Salimath and Bhattacharyya. 
999- D. Lightfoot, personal comnmnication, 2002) also have hcon produced, 
;hchtogethcrcoverLsoybeangenomemanytimeaove.D^^^^^^^^ 
dcs have already been developed and reported usmg someof these ibrorics (Marek 
SSoemTS 1997), these libraries hove been made from different genotypes 

7a^«h, e>iDtu Amerionn Socicly of Agronomy, CropSclcncq SooiAy of Amcrien, Soil Solcneo Sn- 

and Vsits, 3rd cd, Agronomy Monograph no. J6. 
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and with u varieiy of enzymes iind most liavc been made nvaiUible to rtw Pijlji^. Venst 
ArtSl Chromosomes have also been created for the purpose of chromosome 
walklnRundinsituhybriciizaiionCZIuiet ill., 1996), ■ , ,«^/l■., 

K degree lo wWh soybean chromosomes conatiict Uunng me.0B,» has mode 

it diflS w Suet cytoeic analyses. However, n complete l«^0We has now 
Len eooVred (Singh luid Hymowitz. 1 988) btised upon pachytene analysis. Annly^ 

upofhetor^luomaiin, with the short armofsi)tonhe20blvttlent8 being completely 

mosome; 2nU\)t>rt useful for quickly locating genes onto a spf^'ho <3h>omosom^ 

Td ?oTa sociating linicage groups with specific '^'^r ^''^'^'tl^r^rSoC 
lines that Dr. R. Palmer (USDA-ARS) suppli«=d, and some generoted^^ 

Cytogenetics LubutUrbnna.CliampuignJLacompleteaeof20rto^ 
eacli chromosome exists in an extra copy, is now comple ed (Xu e '^^s); "'f 
Sk ""l undoubtedly be useful Ibr integrating classical and molecular genetic • 
Zch as similar cytogenetic collections have been '"^Fna-u ^^^^^^^^ nee. bttr- 

Soybean has a diploid oliromosome numbcrof 2n «40. However. mos\ gm 
em in iShasZl have a genomft complement of 2« a 22 (]^ymow.t7.ei nl., 998). 

ancestor (« « 1 0 which underwent ancuplold loss to n a 10 and bubsequenipoiy 

distant p^st KvS pite of being a polyploid, the genome, for the most part. 
S a ilS^^^^ 'di?loidi.ation' of polyploids is a well-Icnown pmcess «nd 
SusTd by addUions. deletions, mutations, and rearrangements that rapidly inh bit 
on l oS^^^^^^ puiring of linkage groups (Ohno. 1970). Exa^P os of ^^v^^^^^^^^^^^^^ 
;? duplionied genes have been repotted for soybean reccptoivUke Pro'»"i ftn^s 

moto and K.iap. 2001) and CLV I -like genes f^amnmoto et al. 2^ 
ever, this may be a relatively slow process and there remum many exceptions to 

the lamc tS ) caTbe imind in the soybean gennplasm cnllecUon f 
fdsTAn anUs of the average number of fragments detected by hybndization 
orJstricon^^^^^^^^^ 

ZoSSnes emphasised the abundant duplications found in the genorne (Shoe. 
5? I TrnTm'^ than 90% of the probes detected more than two fragtrteril 

fleSro may be single copy Rec|uonoe and that large amounts o the 6«'»"y 
ha^c underS-no genome duplication In addition to the presumed tetraplotdlzation 

''''*"\r.other analysis of hypomelhylated sequences uaing methylation sensitive 
re,tric5?eJiyTes ^ Bl^ly inore than 15% of the hypomct'^y^^^^^^ 

lenoni^^^^^^ (^ihu et nl.. 1904). The remamder ot the hy- 

foS^^ ateS gc^^^^^^^^ to be duplicicd or tnlddle-ttpetilive sequence. No 
SSenoe of silencing of he duplicated legions was observed by methylation. 
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tiybriclizflfion-based mapping hns resolved many duplicftld fcgions of the 
iWsme (Cho et al„ 1989). These homoeologous regions reflect segraontnl nnd 
^ie^enomc cluplicatioii events and can provide much information about tiie evo- 
rtdin of ihe genome (Fig, 6-1). SeCjuencea are often duplicated In the wybeaii 
lOme'in n munncr not easily explained by a teraploidizntion event, hor exnm. 
m most linlcagu groups contain mnrkere that can also be found on other Imkagc 
!, IfOUpa but examples of thLs can sometimes he extreme, For any given linkage group, 
f' dUDlicute mnrkers muy be presem on more tliun one Other linkage group, The nv- 
ima linkage group contains miu-kera tlint can he found on eight other linkage groups 

f (Shoemaker ernl„ 1996), ^, / i 

MuppingorduplicntedgenoSCOntrollingpuboswncemocphologyCappresbecl 

iuid non-nppressed) provided interesting insight into the evolution ot the genome. 
Lcc ct ol ( 1999) mapped Pa\ and Pea to LO-Bl/S and LO-P, itspeotively. It was 
expected thai, l,hc«o genes would map to homoeologou,^ segments ot the linkage 
oroupH. However, other than the genes, no markers appeared in common between 
These regions. Unexpectedly, the gene regions were Implicuted as paralogs through 
intermediate linkage group. LG-H, which connected to ^-0-1 1 /S and LG-P re^ 
ginns through multiple markers. This suggested that regions ol LG-Bl/S and I-O- 
13 las well as LG-H) were evoliitionarily related and that perhaps a diird pubescence 
uenc existed and remained undetected. ^ 

Except for many diseuse-reBlstance genes, most ngronomically impoitnnttmiti, 
,.rc cotitroned by wvemi to many genes acting in concert, The genetic locations ot 
(he quantilalivc gene(s) are known ftS quantitative trait loci (QTI.). Because of the 
uenciic by environinenf interactions on mostqutmtitative traiis.brcedmgforth6m 
requires replicoted field trials conducted over 2 or more yearsin a var ety of loca- 
linns This is obviously dmc consuming and expensive. I he ability to solcU lor on 



LG-H 



LO-F 




1 iu 6-1. axnmplM or homcolosowB rogionii in fioybenn cicloctiib).; wilh hybniteation-hlisocl nilipplne 
lochnUiuos. In ihia exuniple LO-H liiis hnmwloijs on boUi l-O-B I unci l.G-P, while scBmcnts of M3- 
HI niKl iiw ouniiBOied (hroiigli a single markisr will) honwlony to t,C4I. From I.cc el m. 
(20(11). 
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easily identifiable marker that is a good predictor of tho 

OTL U-nit oun save time nnd money in a brceduig prograiri. p.scovery and tagging 

^ ATI 14 fl nrereouisite of ftis lypc ol' Market Assisted Selection (MAi). ^ 

Since iSnSy dozens olVeport.s have flowed out 

m rer™^^ trmis. with perhaps the most erilial deal ng with seed com. 

Sur(DlStal,.1992n)0ther^^^^ 

Kl?— ^^^^ 

r.-ifirtn'»i (»fficiencv and more. A thorough coverage of QTL mapping in soyDenn is 
foTexltS. S'^ri Detailed 8ummari74itlons of these sctidie. nnd otherB 
^irboTund i^sSBa^^^^ the USDA-sponson^d genomic dntnbane for soybean. «t 

stniCtuMdtL led .0 the suggestion that grasses can be^^onsidered to have .m- 
Toen^ie This has important implications in our ability to ixansfer genomic m^ 
t r oWSt on'e grass spU to that of -Jhcr ^^^^^^^ 
7fin and Frecling, 1993). Comparative mapping among legumes l^ai, not been as 
?lTe tS Bib tantin reaitangements that have occurred withm the soybean 

Sv leTX sS^^^^ chromosome segments between soybean and re- 

tKSTlBoutin e'ul,, 1995). Aitltough mung bean m>u, " (^,0 
rijiata] (2n = 11) and common bean (^^'''t^JJ''^""'' t^^J" " 
n bSng lo the subtribe Phnseolinae) exhibit a hl^^ c^s^S'S 
ioiJervation and preservation of marker or er, oSS) wS e 

A mrtm detailed analysis of homologous ^segments of soybean, common 

Xamotctilar g.nefic map to be fully exploited it is » « "^^^^^^^ 
rate Dosltions of genes or QTLs in an unambiguous manner Often, this can be ac» 
SSSonlyf^^^^^ 
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at H time. However. Shoemaker and Speeht (1995) integralcd 1 8 genes Into the map 
ill n single experiment. This wos done through careful 'stnclcinE' of muUitionii into 
porents to btt used hi constructing the mapping popnlntion. Because many oi these 
Bcnofl had been pur into the classical genetic mop. this study resulted in halt ol tlie 
classical genetic linkage groups being integrated into tlie molecular genetic map in 
a single experiment, tbday, mors than 60 loci for qualUatlve traits have been placed 
onto molecular maps, . , , u- • 

Using an integrated map containing more than 800 mai-kers and combmmg 
data from nine different populations, extensive homoeologous relationships were 
detected using RFLP hybridiiiation techniques (Shoemolccr ct nl., 1996). The av- 
erage size of these Internal duplications is approximately 45 cM, with some dupii> 
onted segmenlB covering more than 100 cM. These authors also observed nested 
duplications that suggested atloastonc of tlw original genomes ot soybean may have 
undergone nn additional round of tetrnploidization in the tar distaai past, (Shoemiikcr 
etal.. 1996), 



<y-2 DNA MAMCERS AND MOLECUrATl OBNCTIC LINKAGE MAPS 

Tlie development of molecular genetic maps based upon ONA sequence 
polymorphisms was initiated by tho suggestion that restriction fragment length pOly 
Urphisms (RFI.P) could serve us an approach for the f vdopmenl of nurn'rou* 
DNA marker.^ (Botstein et al., 1980). The application of RFT-P technology to m - 
merous animal and plant species began shortly thereafter, Subseciuently. the avail- 
ability of the polymerase chain reaction (FCR) (Mullis et al.. 1986 as a tool o de- 
tect sequence polymorphism led to the development of numerous additional c asb^s 
of 0NA markers. These included (i) microsatelUte or Simple sequence repeat (SSR) 
markers (Litt and 1-uty. 1989; Weber and May. 1989). (ii) random amphhed poly- 
morphic DNA (RAPD) (Williams st al,, 1990) or arb)t,rm7 primer PGR (AP-PCR) 
markers (Welsh and McClelland, 1990). (iii) DNA amplification flngerprmting 
(DAF) mnrKeW (Cactano^Anolles et al.. 1992). and (iv) amplification Itagmenl 
length polymoiphiam (AFl-P) markers (Vos et al.. 1995). 

fi-XJ Restriction Fragment Ungth Polymorphisms-Based 
GenettctlnkagoMaps 

The first demonstrations of RFI-P in soybean were by Apiiyn et al, ( 1 9S8) and 
Keim et al. (1989) and in 1990 the first RI'LP-based map of the soybean genome 
was published (Keim et al., 1990). To maximize molecular diversity. Kemi et al. 
(] 990) constincted their map using a mapping population derived from n cross of 
cultivated X wild soybean (Table (>-l), This map was developed jomtly by the 
aSDA-ARS and Iowa State University with support from the American Soybean 
Association and saw further expansion during the 1990s with the addition of more 
than 350 RFLP loci (Shoemaker and Olson, 1993) (Table iS-1). Concurrcnily, ihe 
DuPont coiporation (Rnfalski and Tingey, J 993) developed at\extensiyo RFLP map 
with more than 600 loci. Like the USDA'ARS/IowttState map, Rnftdski andTingey 
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( 1 993) relied upon a cultlvnreti x wilcl BOybejin cross to ineroasc the i-elntively low 
level of RFLP present in cultivated soybean, However, a Inrge proportion of the loci 
on these two maps would not be expected to segregate in crosses among culCivatcd 
soybeon genotypes. For example. Shoemaker and Specht (I9.W) used 358 RPUP 
niarlcerfj from tlie G, max x C. soja USDA/lowa State map to genotype progeny de- 
rived from a cross of isollnes of the cv. Clark and Harojioy, A tola] oM 18 (33%) 
were polymorphic in the Clurk x Worosoy population. In a previous report, Keim 
et al. (li)92) analyzed 38 diverse soybean genotypes with 132 iWlP probes and 
found 31 % to be monomorphie and funher, that more than two alleles were detected 
ut only three RPLP loci. In addUion to the relatively low level of polymorphism* 
another complicating I'actor wirJi the use of RFLP in soybean is the duplicnled na- 
ture of the soybean genome, to which flPI^P probes will hybridize on an average 
of 2.55 rimes (Shoemaker et nl, J The duplicated nature of the genome results 
in multiple banding patterns with mostRl'LP probes. One hybridising fragment may 
be mapped in one population and a different or an additional band in another* This 
requires that an RFLP locus be defined not only by the probe ond restriction en- 
zyme being uNed. but also by the molecular weight of the segregating band(B). De- 
spite these complications^ numerous successfu) analysiCH designed to discover QTL 
and characterize genetic variation in soybean germplasm were conducted and re- 
ported itHing RFLP pi'obes from the USDA/lowa State RR.P map. The details of 
this and other similar maps and kirge amounts of related inl'Ormntion cim be accessed 
on the Worid Wide Web In SoyBase, the USDA»ARS Soybean Genome Dambasc 
(http://soybase,ugron.iastate,edu/) 

6-2,2 Simple Sequence Repent Markers 

The desire for soybean DNA markers with greater polymorphism was stim- 
ulated by the diiicovery of high levels of allelic variation associated witli ml- 
crosateiiire or SSR markers In human (Litt and Luty, 1 989; Weber and May, 1 9U9). 
The fact that SSR markers are PGR based rather than hybridii;ation based was an- 
other attractive feature of diis DNA marker system, In the early 1 990s, two reseorch 
groups published similar reports demonstrating the high levels of polymorphism, 
co-dominant inheritance, and the locus specificity of SSR markers in soybean 
(Akkaya et aL, 1992; Morgante and Olivieri, 1993>. Akkaya et al. (J992) found as 
mnny as eight SSR nlleleji at one locus in a set of 38 <?. max and five G, soja geno- 
types. Subsequent reports of SSR allelic varintion in cultivated and wild jioyboan 
(Cregan el ah, 1994; Ivliuighan et nl., 1995; Morgante et al, 1994; Rongwen et al., 
1995) detected very high levels of allelic variation, including one loeuii with 26 al- 
leles among a group of 91 cultivated and five wild soybean genotypes. In addition, 
data analyses suggested Ihtie evidence of the clustering of SSR loci in the soybean 
genome {Akkaya et nl», 1995). Because of (heir high levels of polymorphism, SiU' 
gle locus nature, and random distribution In the genome, it was concluded that SSR 
markers would provide an excelleiU complement to RFLR markers for use in soy- 
bean molecular biology, genetics, and plant-breeding research. The major drawback 
to SSR markers Is the high cost of development, which requires firstly the discov- 
ery of SSR motifs and secondly, knowledge of the flanking sequence to permit the 
design of locus-specific PGR primers. An additional technical difficulty associated 





242 



SIIOEMAKBBETAL. 



wilh SSR technology is the frequent need W clistlngulsh alleles ihftt vary by only 
one or a few repent units in size. 

6-2.3 Restriction Frngmenl Length Polymorphlfims 
iind DNA Ampliilcatlon Finijerprlntlng Mnriters 

In contrast to SSfta, RAPD or Af'-PCR markers require no prior knowledge 
of DNA seouonce nnd as « dominant marker, altcnmtive nllelen arc detected sim- 
ply as the presence or absence of n PGR product. Thus, genotypes cnn be rcwli y 
dciermined using (igarose gel elociTophoresls without the need for more sop his ti- 
t^S^Sms to detect alleHc vnrid.ion. Nonetheless, l^PP* ''"^c not te^^^^^ 
used in soybean ecnelic map development. Ti,e exception f^f RP^^f^^^PD ^ 
constnictcd by ftrreira et al, (2000). which incorporated 1.06 RAPD markers tnto 
u iWimework of 250 existing RFLP loci using a subset oi RILs from the PI 437654 
X I3SR 101 population (Tnble 6-1). Lilce RAPD markers, DNA nmpl.tication fin- 
Rcrprint or DAP markers are nmpliricd using n single arbitraiy primer (Caeiano- 
Anolles et al.. 1 992). The differences between RAPD and DAF technology ore rite 
shorter .irbltrary primer in DAF vs. RAPD (ijcndrally eight nucleohdes), and lJ.e 
use of DOiyacrylamide gel electrophoresis with silver siamine ni the case ol DAI^ 

wkrvra^rose gels for RAPDs. Predigestion of genomic DNA w,.h a «stncnon 
en/yme before PCR amplification is sometimes used to opurmze DAF amphnca-- 
Son producis. A limited number of DAF-gencralcd polymorphisms ™apped 
in the Univ. of Utah. Minsoy x Noir I RIL population (Prabho and Gressholt. 1994), 

No genetic maps were developed in soybean using DAP markers. 

<t-2.4 AmpUHcotlnn PraBmont Length PolymorphlBiti Workers 

Like liAPD markens. the generation of AFIJ' requires no prior 
of DNA sequence and as a result, numerous marker loci can be rapidly developed. 
The AaP markers are generated based on restriction fragment length polymoi- 
phisms. The DNA adaptors arc Hgaled to the ends of restriction fragments and PCR 
Ecr homologous to the adaptors ans used to amplify selected subpopulations 
S trpoo?™ »^ Selectivity rcsulu from the addition of two or three arbi- 
tra y nS^^^^ the 3' ends of the PCR primers, One of tlte urgest avtiUnble 
AFKaps of any plant species was developed in soybean ,Ke.m^^^^^^^^ 
These loci were mapped in a subset of 42 RILs from the 330 RlL P 437654 x 8SR 

the USDA/Iowa State map has 1004 loci, most of which arc Rin<Ps andSSRs (Thble 
tt) The alo^ noted significant clustering with AFLP markers that were gen- 
^at;dulgS..RI/MMrestriciionen^meH(^^^ 
centofUte loci displayed dense clustering. In contrast, "f^^^^^ 
loci did notduster. Psti Is known to be sensitive to cytosme meUiylation and ,n re- 
lation to FCTRm>fiI-generatcd APUPs, Uiose produced using PstVMM appeared 
to ; nit nS cltiering. Keim et al. d W) also noh«i that despite numcr- 
ous n nrJ^r loci and dte framework of 165 RFLP loci. 1 1 of 28 linkage groups cojjd 
Sah^lVndwidiahomologous linkage groupo^ 

?ctuU is indlontive of one shortcoming of AFLP mnrkers. which is the difficulty of 
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eve;;. '^-^^lt'SS"o^l^^^^^ AFLP« hnve proven 

without tho need P'^^'?^^ ^ Z bulked scgregnnt ftnalyBis 

(Michdmorect al., ') '^V,^.^ can then be a.w5- 

6-2.S ASImpIo Sequence Rcpeat-Bnsed Soybean Genome Map 
The dcvclopmeni and mapping of a large b.i of soybean SSR mwkem was 

:str.r= 

of addidonnJ SSR markers. The virtoe ^-^^-f "^7,7/^^^^^^^ ^^..kers dc- 

^ n thiiYi was the Univers ty ol Nebruitka Clw-K X Hnrohoy i!iwuM(- H 1 
6.1). The luKl was the U. ve. y gg^t^ n^^cd. to nsm- 

anu \r the 10+ linknccftrdups derived from OiiCli of mo 

populations (T^'^l''^^ na " of fiftvbcan chromosomes, Likewise, clns- 

piwumed to correspond to *e 20 Paira ot hoyocan cnr . „5j,ocintecl with 

icnl Hnkugogroup « W^^^^^^ frlef d SLl loci and Bl 

^^T/^llZlT^S^i^^^ oth.r clnsHlcal loci. Reports in the liter- 
Ihat hnci notpreviouMy teen nn^^^ 1994) mapped 

proximniciy "r"-* , .^j^^qv Tnfonnation rclnt ng to the sequ^^nce of FCR 
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tocol for their anrpiilkudon cun be obtuincd on iho SoyBaao wtjbsile. Additionol 
inrormation relming to SSR allele sizes in n set nf 10 ciiv^c iJOybenn genotypes, 
OS well nti gel imnges of ihe alleles producca with the samo 10 gcnoiypc^, is avail- 
able on SoyUust^. Data relating to th^ mapping of more than 600 SSR in the Uni- 
versity of Utnh Minsoy x Nolr 1, Minsoy x Archer, and Archer x Noir recombi- 
nant inbred 1 Ine populations can be obtained on ihe huxl Uib, Uni v, of Utnli website 
(http://www.lnrk:lnb.4biz,net/) 
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HflP (DO) dfv ilonolad using (h«t Ar|/]I Normol fonl BBH lad ott \t\ AHnt Uold font, 
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Fig. M CoiiBcntsuu floybemi itioieculur Unkugtj group F dcnncti uping tliroo mHppmg populuuonn: I he 
USDA/Jowa Siuw Univ., Phcm um x a ywja population; Iho Uuiv. of Ufnh, M inBoy x Noir 1 pop- 
ulm4on; untJ tbo Unlv^ of Ncbrnalcn, Clnrk x Hrn'osoy population. Prom Crcgftrt et ul. (1999), 
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The cicfinilion of 20 consensus linkage groups with an overnge of 30 locus- 
specLlIc SSR markers per group (Cregnn et til. 1 999) provided a resource that has 
fncilitnfed the rapid nlignmenr. of linkai^Q groups In existing or newly cmml link- 
imi5 maps with the oonsenaus tuikage groups in die SSR-ba«ed genome map. In lour 
instnnces, n relntlvely small number of SSR markcn^ ranging from an nvcrngti of 
one to as many a^ tiVQ per linkage group was used lo associate linkage groups with 
corresponding linkage groups on the SSR-bnsed soybean genome map. These m- 
eluded two fairly extensive maps, one with 792 markers (Wtt ct al.. 2001) and an- 
other with more than 500 murkerjj (Yamanoka et al. 2001) (Table 6-1). Om\\s of 
die Mlsuzudaizu x Moshidou Gong 503 map (YamanakueCalu 200 1 ) arc available 
on the World Wide Web (hU;p;//dna-refi.kav;usja»onjp/8/2/0a/HTMLA/), [Vtatihcws 
ct al (200 1 ) Hucccssfully po$ition<jd cPNA and genomic clones on consensus link- 




Iho fourth example 
map. 

6-2,fi Single Nucleotide Polymorphism Mnrltcrs 

aingle DNA baiJC changes between homologous DNA iVagmcnts plus tJmall 
insertions and deledons, collectively referred to ns single nucleotide polymor^ 
phJsms (SNPs), are by far tlie most al:^undanr source of DNA polymorphisms in hu- 
mans (Collins et aU, 1 998; Kruglyak 1 997; Kwok of ul, 1996) and mictj (Af//.v mwA'- 
cuius) (Lindblad-^Toh et al., 2000). In Iwnians, these variations are estimated to occur 
at a frequency of about one per 1 000 bp when any two homologous PNA segmenUi 
are compnted (Cooper et al, J 985; Kwok et nl., 1996). Tn plantB. relntlvely limited 
data on the frequency of SNPs are available. Cho etal (1999) compared the DNA 
sequence of more than 500 kbp of the i^oAmhidopsis tlmlima genotypes Columbia 
and Landsborg erecta and detected one SNP every 1034 bp. However, most other 
reports hnve indicated much higher levels of sequence variation in Amhlclopsis 
(Kawabe ct al., 2000; Kawabe and Miyashii^i, 1999; Kuittinen and Aguacle, 2000; 
Pumgganan nnd Suddith, 1999). Tn maize (Z mays ssp. mays L), Tenaillon et al. 
(2001) sequenced more than 14 kb of coding and noncoding DNA from 2 1 loci on 
chromosome i in ench of 25 genotypes and discovered a mean of 9.6 SNPs per kbp 
bt^lwoon any two randomly selected genotypes. In soybeani SNP DNA markers are 
nirendy in use in industrinl^breeding programs (Coliill, 2000) using allele^epecihc 
hybridization (ASH) for SNP detection similar to the procedure described by 
Coryell et aU ( 1 999)^ It is thus apparent that SNP markers m likely to have an im- 
portant role in the future of soybean genome analysis and manipulntian, 

Oni:il reoently* the comparison of variation in DNA sequence among soybean 
genotypes has been confined to the assay of single genes or DNA fragments, gen- 
ernily with the purpose of delming gene structure or function or evolutionru-y rela- 
tionships. For example. ScuHon ct al. ( 1 987) compared 3543 bp of the Oy^ glycinin 
gone plus flanking DNA in two genotypes and found thrde SNPs. Zakharova et al. 
(1989) compared 789 bp ofcDNA sequence encoding the A3B4 glycinin subunit 
of the soybean ov. Mandarin, RannaythlO, and Mukden and found t^vo single nu- 
cleotide polymorphisms. Xue et al ( 1992) discovered 20 single base changes and 
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four indelfi (inseitlon-deletions) in u comparkon of 2942 bp of the Cy4 gene ond 
Honking DNA in the soybenn genotypes ForreBt, Raiden, nnd Dare. Zhu al 
(1995) sequenced 400 bp ol' RFLP probe A-199a in the cv. BSR-10I and A8J«' 
356022 and tho a nm germplnsm line P1437654 and found a total of nine SNPs, 
To permit the comparison ofSNP frcfjuency among loci of vaiying length nnd be- 
tween populations that vary hi slzt, measure^} of nucleotide diversity Kuch as n 
('I'bjima, i 983) and Wauerson's theio (0w) (Wntterson, 1 975) have been deviaed that 
are standardized for length and adjusted for iiample^iize* Nucleotide dlversiry from 
the four aforementioned studies range from 0w ^ 0.85 SKPa/kbp (Seallot^ et al., 
1.987) to 15 SWs/kbp (Zhu cl al., 1995). The wide diverniiy of values RUg- 
gested that » systematic study of SNP frequenoy in soybean was needed. 

In recently completed work to assess the SNP frequency In soybean, a group 
of 23 fioybenn genotypes thni represented 18 ancestral varieties from which North 
American soybean plants arc derived (Clizlice et oIm 1994) as well as seven parents 
of RIL mapping populations wna analyzed (Zhu et ul,, 2003), A total of more than 
28,5 kbp of coding sequence and 37.9 kb of noncoding (incrons, 3' and S' UTR, 
and flanking genomic sequence) from 1 16 genes wdfi ijequcnced in each of the 25 
genotypeSv The SN? frequenoy in codhig and noncoding DNA wa» 1 .98 kbp and 
4, j 9 kbp" ' . respectively. Nucleotide diversity w«s Ow ^ 0-53 and 1 . 11 in coding nnd 
noncoding sequence, reHpectively. The menu 0^ 0.97 was Btmilar to reports of SNP 
frequency in humans (Wang et al, 1998; Car^ill et al,. 1 999; Halushka-et al., 1999) 
and 5- to 10-fold lower than reports in m\rit (Remington ct ah. 2001 ; Tenaillon et 
nl., 2001), Despite the relatively low frequenoy, SNP? were discovered in or around 
74 of the 1 1 6 genes ibr which sequence datp were obtained. Tliese data suggested 
thafSNP discovery focuBed on noncoding sequence, where greater sequence poly 
morphism is present^ will permit KucceKKful SNP discoveiy in soybean, One obvi- 
ous target for SNP discovery is 3' UTRsofcDNAs* Discovery and miippingof SNPa 
in 3' UTRii will not only create useful genetic markers but will ponirion the corre- 
sponding expresjied gene on the generic map. Tlie resulting transcript map will pro- 
vide a powerful tool to associate QTL with candidate genes. An olternntive npproach 
to SNP discovery the sequence annlysls of polymorphic AB.P bandii or adjacent 
polymorphic sites og suggested by Mebjem et nl. (2001). This approach wa$ suc- 
cessful in discovering numerous indelji and SNPk in soybean. 



6^ TECHNOLOGIES FOR DNA MARKER ANALYSIS 

6«3.1 Rcjstrtctlon f^Vogment Length Polymorphisms 
and Rnndom AmpUWed Polymorphic DNA 

The electrophoretic separation of DNA tVagments, iransfcr and iinmobiliza- 
don on a membrane, and detection of spccifio sequences was outlined by Southern 
(1 975) and is the basis for the detection of RFLP. Numerouii descriptions of the pro- 
cedure are avalJable (Sambrookct al, 1989; Grnnt nnd Shoemaker, 1997) and as a 
result there is little need for detailed description here. Likewise, tliere is little need 
to describe the tmalysis of RAPD markers. As indicated earlier, two of the Impor- 
tant virtues of RAPD nuu-kers tiro ease of use and broad applionbllity. Almost 
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Without exception, RAPD I'rnem-jnts ore walyzed ustns ngaroae SeUlfiJIJP*'"!^: 
wi>ltdti 10Q7\ Hitch resolution ORWOses such nsMetaphor ngitfo e (FMC "lo- 

MA) are tJso used for RAPD frngmont nnalys.s. 

6-3.2 Ainpllflcntton Fragment Length Polymorphism 

me AB1 PRISM 377 ONA and the Licor Global IR^ DNA Analy/,cn 
6-3,3 Simple Sequence Repcni: 

Because SSR nllokB are defined by th. ""fnb*'';^^ "J "'J^^ Z''^^, ^ 

either slab gel or capiHaiy cl<Jcti'opho«9i8. A nficesHuniy on 
nWe options forSSUttllcle sizing follows. 

(Wl.4 Agarose Gel Electrophoresis 

preceding patftgraph. 

6-3.S polyncrytomide 6cl Electrophoresis 

T -.-nih, mn.i -^qR allele Bizinft wns performed on high-resoliition denntur- 
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ins wilh the pNA-specific stnins SYBR-grccn or SYBR.gold (FMC BioproduoUi) 
and acwctioii on a UV tfansilluminaior. Numerous olhof gel find ctiplHnry doc 
irophnresiB systems ure nyiiilnble for high thrdughput automnred or Kcmi-nuto- 
mated SSR nllde Bizing. thcw include gel nnd capillaO' olccixophores s systcriis 
from Applied Biosysten« (Foster City. CA). LUCOR, Inc (Lincoln. NB , Beclc- 
man Coulter (I'ulJmon. CA) und Amcrahnm Biosciences (PiScataway.W). Ad- 
vniUflfies of these systems over stniidnrd sequencing gels for sizing SSR-Conloin- 
inn PGR products are (i) single-baBc resolution over n wide size range troni 7? to 
500 bases, (ii) nutomated siting, (Hi) QUtomnted daCu output, and (iv) elimmrtion 
of rndionctivily. Numerous soybean researchers have reporied the use oUbe ABI 
PRISM 377 DNA for SSR allele sizing in soybean (Diwan and Cregan, 1997. Mian 
ct nl., 1999; Song et al,. 1 999; Narvel ct al., 2000a. 2000b). 

6-J.6 Cnplltary Elcctmphoreslsi 

The same fluoresceni' chemistry employed In Ihc Pcrlcin.Elmer DNA sc- 
quencers described above is used in capillary «'f trophoresis syster^^^^^^^ 
or 96 cdoillary capacity available from Pertan-Bimer Applied Biosysiems. Beclc- 
n nn Co liter Inc, manufactures an eight capillary machine and Amcreham BiO- 

tlplex SSR allele sizing. 

(W»,7 Mass Spoctromctry 

Bruun ct al (1997) proposed the use of matrix-assisted laser desorption/ion- 
i,ation Ic of-flight mass Ipecuomciiy (MALDMOP) for SSR allde B,.me. 
This system requires the annwling of a single detect on primer wi Am nj^^^^^^^ 
of ihe 3'-CPd J' the SSR. A DNA polymerase cxicnds thts primer tiirough tiieSSR. 
b 'JSmer'SLted DNA synlhesiS, A dideoxyancleotlde ^phof^^^^^^ 
Included to lermimiic the reaction at a point past the '5'-end 0 the SbR, Bxtenfiion 
SS r^omdifferemSSRalleles yield product, thatdiffer 
beVof bases in tlie alleles. Hwse products are i«80lved using MALDl-TOF mass 
spectromeiry. 

MS Single Nucleotide Polymorphisms 

The ommise of SN? markers is the enicicncies in cost pec data point and speed 
of data acquisition that will result from the technological innovations I'^f « be 
forthcoming from intensive tesenreh that hns and continues to Ije focused on SNP 
detection. Numerous reviews of these technologies ure available (Bioojf . IJ^9. 
Gut 20) ' Kwolc, 2000! Kwok and Chen, 1998; Shi. 2001 : Syvancn. 2001). rhere 
are four b^sic approaches to SN? detection. These include d) «llcle-specific hy- 
;Stion(ASW)ornllel6v8pedficoligonucleotidehybrid.2iit.on^ 
extension (SBB) or miniscqucncing, (lii) the oligonucleotide ligation (OU • 
and (iv) allele-specific cleavage of a "flap probe". These approaches have been com- 
bincd with numerous different detection technologies, A """"ber ottheK am^^^^^ 
marlzed in mie 6-2. In many oases a SNP-oomainlng PGR product i8 iho necea- 
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'^\sm) ^ ^^^^'"^ ^'^^^"^ oi'rcchnnlogicH used in Che tloicclion oU'inglu imclcoUdo i)olymor|5hlanw 

Allele specif-ic hybriclizfliiftn 

•*¥"Iab{5ll{id ppotHJ hybrkiiwd to u SNP-cnnmlnina 

fragment immobilized oii u mombr«no 
5' nuclcttfio assay 
Molcculttr beaqnns 

mecuronic dot hiot on pfimlcnnductnr microcbipp 
BIcciriQ field dcnniurntion 
Affymotrix ollgochlp 

MwtujAdcs cleaved from (illcla Hpeclfiq oligftji doii^tcd 

vlfl indBS epwtromoiry 
Riindomly ordered fiberoptic geno mys 

Flow cylomotry 

Dynmnio nllalg-fipeqiric hyfarklizatlon (DASH) 
Slnglo-hdw? cxienfllott or njinippquencin^ 

iJiiiijIc tecxiQtitiion-Ua may on iriufla aliUcs 

(SBE-TAQS) 
Muti'ix^uEBisfed laser dqRorpiion ionly.mlun-'nmB 

of High! (MALDI-TOP) mm ppceiromcuy 
PluonjRccai dideoxynuDlooiidc tripJitmphQics fddNTPs) 
Flowcyiornotry 

Pyro8cquondnj« (muliiplc hm cxicnslon) 

Donniuringljiuh perforinunco liquid chroninlouriiphy 

HnuoneRcciico polar(7.nlion 
Otigonuc^iccMicJo llj^uiinn may 

Riitling circle ^implificution 

I^iow oyromeuy 
AUi?lq-'fl|!>eeific oloovnge of ii /'tlnp proho" 



Coryell ot nl, (1999) 
Ueqinl om) 
TVagt 01 ul. (1998) 

SnsnOWBki til Ml, (1997) 
%oliikycl (d.(l999) 

KbkoriM oul. (21)00) 
Slccmcrscul. (2000) 

Yuo! nl (2001) 
Wncftctal (20DI) 



Hlrschhomei ul. (2000) 

UUIe PI qI. (1997); Roifti cl nl, (1098) 
LindWDd-'lbhoml (20(Xl) 
Chon u( til (2QD0) 
Aldcrborn ci ul. (200*)) 
Hooifondoomct ul.(l999) 
Chcncl ul.(1999) 

Qi oHil (2001) 
Innnonpci iil.(2000) 
Foi-scsi ul (2000) 



aary target for detection; however, in mmt inatancea genomic DNA H the target unci 
0 PCRatcp 1{3 not required. An of December 2002, nearly 5 million puliitive huniiin 
mP$ hnci been submiued to clbSNP (http://www.nebi,nlfnjnh.gov/SNP/inclex.h(:mi), 
the National Ceiuer for Bioteclinology Tnfonmtion, Narjonnl Jnsr.iruiqs of Heoitii, 
SNP database. Clenrly, the humnn genetics community is focusing oj) SNP» ns tl 
major rcseiu"ch tool for dnig discovery, diagnostics, gone discovery, population ge* 
netlCH. nnci otlicr opplicntlona. The casts ol- SNP detection are likely to decrenae w Wle 
die ea^Jc and speed of detection improve. The plmit genetics community isrands to 
be a benchciary of the investments in technology being made by human geneticists, 



M GENE DISCOVERY 

(MJ Expressed Sequence l^gs 

For nearly two decades random secjiienclng of gone transcripts has been rec- 
ognixed us a simple and eflleient method of identification,of many of the expressed 
genes m an organism (Putney et a!., 1983). These sequences, Icnown as Expressed 
Sequence Tbgs (BSTs). have become a valuable and eflicient method for gene dis- 
covery (Sterkyetqj., J998;Hniicrctal., ]996;MmTaetnL» J 999). When sampling 
js random, the frequency of nppearance of any given EST permits the idenUfica- 
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p^JactH have been r^^^^^^^^^^ S^^::XflBrLL rapa (l>eldnen8i« 

^'^•''IhttS al (2002) reported on n global. multi-tlBsu. BST analyBis for 
, M^r,hJn 120 000 E^^ were genernted rrom more than SO cWA li- 
soybean. '^'^^^^f^^'JJ^^^^^^^ 

brorics fepreseMlng a wide: range^^^^^^ demonstrate correlated potierns 

«nvkot»memnlcot,dit.onB.This tt^dy w^^^^^^ 

re„rrsSrS«^^ 
model crop legume, 

6-4.2 Genome Soqwcndng 
sequencing lU currently "".^f *"y,I",„rt ^ procress (Theologis et 

soybean genome (Ma«k a nl ,. 20Ol>. P ^ to repetitive DNA, while a l,t- 

Roqueitow sampled near SSR marker!. correBpgnuc . . h (putnrtve by- 

cle\.o,^ than 18% of the 

of the sequences sampled had "6" , of BAC-end sequences 
atound oa nnchonjd locus provided the ^?P^""^^ microsynieny were obsen^ed 
,ween aoyboan '-^^SSi'ZH^^^^^ <^^^'^ 
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bolio rcgultttlon of ihe orgnnlsro, 

(i-S FUNCTIONAL GENOMICS 
n«.^tf hmuLTh uschnolOBlcs stimulated by tlie human genome project nre fun- 

the broadest view, functional genomics Is defined as the process of gen- 
pre.s8.on f'^f'^^i'T^!^^^^^ fo underfitund the function of genes. Bio.n- 

SSnr^ and information proeeaoing (He iter „nd .^J^Je'^Eof ftrnditigd^^^ 
The new Plant Genome Program, which received $85 "Tf ""L 

in., leSramthe National Science foundation (NSF). stimulated the developmen 

S!S for gene expression analysis, gene togging, and mappmg (Wnlbot. 1999). 

6-5.J Creating a Soybean "Unlgene" Set 
One Roal of theNSF-sponsored program for "Soybean Functional Genomics'' 
... A.J2Tn set of 30 000 unique genes from soybean. accomplish this, the 
KSltom S cimmod ty J^^^^^^^^^ "Pnhllo EST Project" (Shoemaker et a 
St^u^ raw mnteri The mRNAathat arc more abundant mvarioua 
5Ss Jl t m^^^^ Sly W««ented in the EST collections. The ESTs are com- 

SS ln this way. longer sequences representing exprisaed genes are assera. 
SriS idenHcaSi enc^^^^^ that represent redundant clpnes are recogniwsd. Ppr 
^ aonS«; rcpSg the stSrage protein KunitJ! trypsin inhlb tor consist 
S iTf ESfSi^ a ^^^^^^^^^ «"oV2600 sequences in « cDNA libru^ from 
1 devS g c^^^^^^^^^ of soybean whereas the contig representing ^oy ean 
IpoSrgSe'lLsi^^^^ 
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contigs In a non-normalized cDNA libniry is n rough approximation of tho relative 
nhiindaiice of the mRN As within thut tissue. 

SenX longest clone, Jn uddlrion, the many spqucnces thoi occur only once 

llndeton^ and independent contigs that result from a computer asscn^ly ol BS Ts 
b J^ m^timTofth. number of unique genet* ift the org£»«isni. Thi. proceBs 
StoSuBc ihe number of ESTs grows, the number of unique genes 
rrSsm will coiUinue to be refined, To dnte. this number forsoybean is ex» 
cecdinSSo Shc^irakcr et nl.. 2002; Vodkin ct nl.. 2003) from a collection o|- 
mSr*! So8 000 ESTs. The AMopd^ genome sequence hus revealed ap- 
ToxlSy 26 oS gTnes (Arubidopsis Genome 
EoTiStX) ) T^i^numberofburnnneen^^ 

5o 000 sequences from the complete humrm genome sequence (liUemHUonn 
Humnn OenoTs quenchig Consortium. 2001); however, thnt. number .s Bjl 
unrdeSn^^^^^ « t^l'. 200t). Altcrnntive splicing nppe«« to resu^^ m a 
rchlmce nUlJrofproleinslnt,hehu^ 

Z dS In^mL splicing in soybeun or other higher plants hns not been 
I csflcrHowS^^ higher plants, including soybean, have extensive dup 
SSrof wc^s, Mnny Lctionnlly equivalent proteins ore encoded by small 
Smnll ierontof the surprises to come from the knowledge of the comp etc 

coSed clupllcatcd sequences (Marteinssen and McCombie. 2001). 

6-5.2 Globol Gene Kxprcssion Analysis 

The genomes of higher eukuiyotcs conuiin a very »»^S0 n;'« J^^^^^^^ 
,endlngup\ntheo^J^^^^^^^^^ 

H HnnVThis reualfltlon may be tightly linked for suiles ot genes (e,g.. a P««iculnr 
I SifCTv Sle gene cxpre sion paficms have been studied to under^ 

cwie exDression. This has resulted in our (jurrcnt models for gene regulation but has 
Ste7or«b I ty to understand complex regulatory r«latk.nsh.ps "Hiong g ne . 
SrecTadvances In genomics, vety large numbers of 
muknneously analyzed for their expression levels in a compm nr,,ve tftshion between 
K„S.wIcnl states using mictoarrny or bioc , , , 

ufteSueX thrJughputorglobul «"'^yt^r4LX^et 
K„„» hZv, .l^^i,.M'nied as an outm-owth of tlie human genome project (Velculchcu ct 
Sc en u , 99^ DeRisi et al., 1997; Marsholl and Hodgson 
99 t TteSh^^^^^^ m density expression arrays of cONAs on oorr.~^^ 
nylon filters with radioactive probing, (ii) "^icroWr "chips usmg lluoresceni 
probes, and (iii) serial analysis of gene expression (SAGE). 
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6^X1 High Density cDNA Arroys with Radioactive Probing 

High density arrays arc one method for assessing gene expression, The 
cPNA clones ore iirrnyed in 384^well platen and spoit(5d by robots onto nylon 
membrftnes. Either the bacterial colony, plosmid DNAs, or PGR produclu can be 
spotted. As mnny as 1 8 000 oDNA$ can arrayed on a (liter of about 20 by 20 cm 
u$ing robotic technology* The hybridizntlon metliod Is analogous to that used for 
conveittiontil X)NA or R.NA blots on nllTocellulose mcmbrflttes. The membranes arc 
generally probed widi ^^P-oDNA labol produced by reverse trnnscdptlon of mRNA. 
After hybridization, the membrane is I'magod ujiing a phosphorimnger and die 
highly compM pattern must be read by image analysis software. This technology 
can be used to Belect specific cDNAs that are very weakly expressed in the library 
for further analysis. This "filter normnlization'* method i3 a tool for gene dlBOOv- 
ery of more weakiy expressed genes, High density expression arrays have been used 
to increase gene discovery in soybean (Vodkin et uU. 2000). In addition, one can 
compare the expression within a single library of numerous genes. Tf a |'unigeiie»' 
collection is used, then the relative expression of various genea within a isingle sam- 
pie is easily obtnined from high density membranes* The disadvantage Is dint with- 
out dual labeling* one cannot easily compare between two different mRNA sam- 
ples. Dual labeling H one of the main advantages of micronrrays using fluorescent 
detection. 

(W5.2.2 DNA Miqroarrays or Chips t" Analyze 
Globnl Gene Expression Pnttoriw 

An alternative method to the high density 1 1 Hers is microarray technology 
using fluoreucenf probes (DiRisi et uK, 1 997; Sclienn et nl., 1995)* in this method, 
the inserts from cDNA clones are an^pliiicd by PCR with vector primers and the 
amplified DNAs are arrayed onto glass microscope slides by a computer-con- 
trolled printing device. After fixation to ihe slide, the DNAiion die array are probed 
witli fluoresccnOy labeled cDNAs made from total mRNA of a ptirticulur tissue. Two 
different fluorescfintly labeled probes can be used simultaneously on dio mm iilide. 
that iy, one in the red range and one In the green range, The riworoRoent Images are 
capuired with a sconning laser microscope and the intensity of each spot can be com- 
pared to standards of known concentrations to give quantitative data on gene ex- 
pression. An arrayed set of 2375 unique cDNA from Ambkhpm has been used to 
examine changes In gene expression during pathogen challenge (Scheak et at., 2000) 
and will be used to address important problems In vmv^ other plant systems 
(Bouchezand Hofte, 199IS; Mazureta).» l999;DellaPennft. i999; Somervilio and 
Somervilie, 1999; Somervilk, 2000). 

Micraarrays are most effoctivt? when all genes within an organiiim oro rep- 
resented, Current, technology for cDNA lurrays are typically in the ninge of 5000 to 
1.0 000 genes on glas$ glides. Thus, It is important to reduce the redundancy of the 
cDNA libraries before they are spoiled to maximize the number of genes on the array 
by printing from the "unlgene" set instead of from non-normalixed libraries. Cur- 
rently. niTnys of 27 000 cDNA$ (9k per array) for soybean have been printed con- 
taining low redundancy cDNA from mnny of the 80 cDNA libraries of the EST proj- 
ect. Experiments to examine differential expression during the process of induction 
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environmental stresses such aS drpueh ■ ^J^J' J. ihat difl'er in protein 

glass array. tl«t will ttUow «t iSSonS^^ 

Sns. The 3' regioa ^'^^ !'^Tfn^ mmb^n. AmmUy of the Cull- 
S Eonucleotides that S^Sp AflVmi chip tech- 

length oDNA« will '^'S'',""'?^ cSol^^^^^ '^'^'^^ "^'"^ " Pi*"" 

noloByinwWohappto)iimately20nudeOlifleB Jr6^^^^^^ expensive and nteO 

requires fuH-lengA ^'r'i^^S «B«urch on soybean so thftt 

lnlU«llon is also "^^'"tllSoi^^^^ ni^tehcd to the predicf^d OBi's. 
peptide masses or pactifil protein '■^'-l"^"^™^. gy^y to obtnin in soybean 



of soybean genes, 



Ser..«„.lys. of ,ene expression rcj-r^^^^^^^ 

tissues (Velcalescu eta ..1995). SA^^u-J ^ ^een sho^n that 10 nu. 
near the 3' end of ind^du^ >^^^^^^^ Ugated 
cleotides aniciudy ident,ip9S% ol n™ 

into concatamers mtd .ro '1^" f '"^"^^^^^^^^^ Bequenced to identify SAOE 
per clone). Hundreds or even Vho frequency of nppear- 

wgs nnd to determine fw>en.y P^™^^^^^^^^^ estimate expression levels 
i of tags in the library has b«=n;h(>wn ^^^^^''J i^,,, (o„c per cell) 
in ihcmRKA source tissue. MMSage ^t^^^^^^^ ^,^^,3 ^ 

^an be qiiantitativcly d'SW"*^ J>y f quenU^ " ^^^^ 
hlehly and moderately expressed genes ^ ^^^^^^^ y ,,quence duta base, A 
sequenced clones. SAGE analysts requires an > ..tensiVft se- 

dUDdvantage is that low "^"^"^ ^"^^^^^^^^^^^^ can be expensive. Ini- 

quencine of the \AGB g^^^^ in 132992SAGW 

(5-5.3 iMncUonBlCoWenccs of GonoDuplleaUon 
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sy„.h J« (CHS) te.ll, 1. soyb»n « ffoS fw) Pto of to. 
Son to providing m raw mnterini tor WoWon. (en. dupfatlo.»i 

otic sysieiTiB (Wolffe und ^l"! 7^ seed coat8. The dominant form 

sicaldominuntgenetic marker thnire^^^^^^^^ 

,vFthe Iftcus Is nrwent n most commercially usea aoyouuii vououm ^ ^ 

SistsrnSJ'iS^^ 
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(U SUMMARY 



Soybean rescorch benefiUi greatly IVom n "check^of r program i" ^l^^if^^ 

t onhe n n or objectives of thnt program Is to prov de genomic tools to 
REPERBNCES 




^ DNA in fioytau.. a«^i.M 32.(13 1 I '"j! ^ c.cpn- 1995. Injegi'mi^rt of xm- 
Akknya. M.S . R-C Shc«m"te J-U- c4> Soi, 35: ) 439- 445 
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Abstract A total of 391 simple sequence repeat (SSR) 
markers designed from genomic DNA libraiies, 24 
derived from existing GenBank genes or ESTs, and five 
derived from t)acterial artificial chromosome (BAG) end 
sequences were developed. In contrast to SSRs derived 
from EST sequences, those derived from genomic li- 
braries were a superior source of polymorphic markers, 
given that the mean number of tandem repeats in the 
former was significantly less than that of the latter 
(P<0,01), The 420 newly developed SSRs were mapped 
in one or more of five soybean mapping populations: 
'Minsoy' x ^Noir \\ 'Minsoy* x * Archer', * Archer' x 
^Noir \\ T.lark' x 'Harosoy', and A8l~-356022 x 
P1468916. The JoinMap software package v/as used to 
combine the five maps into an integrated genetic map 
spanning 2,523,6 cM of Kosambi map distance across 20 
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linkage groups that contained 1,849 markers, including 
1,015 SSRs, 709 RFLPs, 73 RAPDs, 24 classical traits, 
six AFLPs, ten isozymes, and 12 others. The number of 
new SSR markers added to each linkage group ranged 
from 12 to 29. In the integrated map, the ratio of SSR 
marker number to linkage group map distance did not 
differ among 18 of the 20 linkage groups; however, the 
SSRs were not uniformly spaced over a linkage group, 
clusters of SSRs with very limited recombination were 
frequently present. These clusters of SSRs may be 
indicative of gene-rich regions of soybean, as has been 
suggested by a number of recent studies, indicating the 
significant association of genes and SSRs, Development 
of SSR markers from map-referenced BAG clones was a 
very effective means of targeting markers to marker- 
scarce positions in the genome. 

Electronic Supplementary Material Supplementary ma- 
terial is available in the online version of this aiticle at 
http://dx.doi .org/1 0. 1 007/sOO 1 22-004- 1 602-3 



Introduction 

The first soybean (Glycine max L. Merr,) genedc linkage 
map of molecular markers was reported by Keim et al 
(1990). This map consisted of 26 genedc linkage groups 
containing a total of 150 restricdon fragment length 
polymorphism (RFLP) loci and was based on a F2 
populadon derived from an interspecific cross of G, 
max {A81-356022) x G, soja (PI468916). Lark et al. 
(1993) subsequently used 132 RFLP, isozyme, and 
morphological markers to construct a soybean genetic 
map comprised of 31 linkage groirp'ST Shoemaker and 
Specht (1995) mapped 110 RFLP, eight random amplified 
polymorphic DNA (RAPD), seven pigmentation, six 
morphological, and seven isozyme maikers in an F2 
populadon derived from a madng of isolines of the 
important soybean culdvars 'Clark' and iiarosoy\ 
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These early genetic maps were primarily based on 
RFLP markers. Due to the lack of polymorphism of RFLP 
loci in soybean and/or the complexity of multiple DNA 
banding patterns detected with most RFLP probes, simple 
sequence repeat CSSR) or microsatellite markers were 
proposed for map development (Akkaya et al. 1992), 
Most SSRs are single-locus markers, and many SSR loci 
are multi-allelic. These characteristics make SSRs an 
ideal marker system not only for creating genetic maps, 
but also as an unambiguous means of defining linkage 
group homology across mapping populations. In 1999, 
Cregan et al (1999a) reported the development of 606 
SSR loci which, together with 689 RFLP, 79 RAPD, 1 1 
AFLP, ten isozyme, and 26 classical loci, were mapped to 
one or more of three populations: the USDA/Iowa State 
G. max X G, soja Fi, the University of Utah ^Minsoy' x 
'Noir r recombinant inbred lines, and the University of 
Nebraska ^Clark' x Tiarosoy' F2 population. These three 
separate maps provided useful information relative to the 
consistency of marker order and genetic distance among 
the different populations. The Cregan et al (1999a) report 
established, for the first time, 20 consensus linkage 
groups, which were assumed to be the genetic correlates 
of the 20 soybean chromosomes. In that report, a total of 
412 SSR loci were positioned in the *Minsoy' x 'Noir V 
mapping population of 240 recombinant inbred iines„ The 
resulting map was approximately 2,400 cM in length, but 
contained 36 intervals of at least 20 cM, and 79 intervals 
of at least 10 cM, in which no microsatellite loci were 
positioned. Inversely, there were 67 distinct intervals with 
less than 0.01 cM of distance between two or more 
adjacent SSR markers. In some of the 67 intervals, there 
was no recombination between adjacent SSR loci. 

To develop microsatellite markers targeted to SSR-free 
regions as well as to saturate genomic regions of scientific 
interest, bacterial artificial chromosome (BAG) libraries 
can be screened by DNA hybridization or by PGR to 
identify clones from specific regions of the genome. New 
SSR or other DNA markers can be subsequently devel- 
oped from those BAG clones, making it feasible to 
discover new SSRs associated with RFLP or other 
previously mapped markers. Employing this strategy, 
Gregan et al (1999b) successfully developed new SSR 
markers targeted to two regions of the soybean genome 
near soybean cyst nematode-resistance loci on linkage 
groups G and A2. Genetic mapping confirmed that the 
new SSRs mapped to the correct sites in the genome. 

Genetic markers are frequently polymorphic in one 
population, but monomorphic in another. JoinMap anal- 
ysis (Stam 1993; Van Ooijen and Voorrips 2001) allows 
one to combine data from map populations in which not 
all markers are in common to obtain combined estimates 
of recombination. This approach not only increases the 
number of markers on the map, but also increases map 
precision and resolution, 

In the eariy stages of microsatellite marker develop- 
ment, genomic DNA fragments containing SSRs were 
isolated from genomic libraries. More recently, EST 
sequencing projects have resulted in a wealth of sequence 



DNA information in numerous crop species including 
soybean. Some ESTs contain di- and trinucleotide-repeat 
motifs, making EST collections a potential source of 
microsatellite markers. The use of ESTs as a source of 
SSRs has been reported in a number of crop species 
including rice (Gho et al. 2000), grape (Scott et al. 2000), 
barley (Kota et al 2001), sugarcane (Gordeiro et al. 
2001), and wheat (Eujayl et al. 2002). 

The objectives of the work reported here were: (1) to 
evaluate the potential of soybean ESTs as a source of 
SSRs for marker development; (2) to assess the success 
with which the development of SSR markers could be 
targeted to specific positions in the soybean genome; (3) 
to develop an additional set of SSR markers to Ixirther 
saturate the soybean linkage map; and (4) to create a 
consensus linkage map from five commonly used soybean 
populations using a JoinMap analysis. The creation of a 
high-density, integrated soybean linkage map with more 
precisely positioned markers would permit a better overall 
assessment of the distribution of SSR loci in the soybean 
genome. Moreover, the map would be useful for map- 
based cloning efforts and would provide a framework for 
the positioning of single nucleotide polymorphism (SNP)- 
based loci that are currently being developed from 
exisdng ESTs and other available sources of DNA 
sequence (Zhu et al. 2003). 



Materials and methods 

Sources of SSR-containing sequences 
Random genomic DNA 

The basic procedures of cloning and identification of microsatellite- 
containing, 500~7()0-'bp genomic clones of 'Williams' soybean 
DNA were described previously (Cregan et al. 1994; Akkaya et al. 
1995), Primer pairs were designed for the flanking regions of repeat 
motifs that consisted of either ten or more dinucleotide repeat units, 
or eight or more trinucleotide repeat units. 



Targeted SSR-marker development 

BAC clones putatively associated with specific positions in the 
soybean genome were identified either by hybridization of RFLP 
probes (Marek and Shoemaker 1997) or via PGR as suggested by 
Green and Olsen (1990), RFLP probes were used in an attempt to 
identify BAC clones at genome locations where RFLP loci, but no 
SSR loci, were present. Conversely, SSRs were used to identify 
BAC clones in an attempt to develop additional SSR markers 
tmgeted to a specific genomic location. The details relating to the 
use of BAC clones as a source of DNA for targeted-SSR 
development were described by Cregan et ah (1999b). 



SSR-containing repeats from ESTs 

Upon the initiation of this project (December 2000), 136,800 
soybean ESTs were available in GenBank, These ESTs were 
screened to identify sequences containing ten or more dinucleotide 
SSRs or eight or more trinucleotide SSRs, 
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Primer design and examination 

PGR primeis were designed to the flanking regions of microsatel- 
lites with ten or more dinucleotide and eight or more trinucleotide 
repeats using the software program Oligo 5.0 (National Bioscienc- 
es, Plymouth, Minn.)^ Primeis were synthesized by BioServe 
Biotechnologies (Laurel, Md ). Each primer pair was empirically 
tested for polymorphism using 'Clark', 'Harosoy', 'Jackson', 
'Williams', ^Amsoy', ^Archer', ^Fiskeby', 'Minsoy', 'Noir ]\ 
'Tokyo', A8 1-356022 (G imix) and P1468916 (G, Soja) genomic 
DNA as templates. The first 10 of the above 12 genotypes were 
described by Cregan et aL ( 1999a). These ten genotypes represent- 
ed a range of diversity within the cultivated soybean species. 
Primers designed from the ESTs were only tested on 'Minsoy\ 
'Noir ]\ and 'Archer'. The ^'P-labelled PGR products were 
analyzed on a 6% DNA sequencing gel with 30% foniiamide, 
followed by autoradiography. 



Mapping populations 

Five widely used soybean mapping populations were used for 
microsatellite positioning; three of these, the IJSDA/Iowa State 
Univeisity A81~356022 x P1468916 (MS) population, the Univer- 
sity of Nebraska, 'Clark' isoline x 'Harosoy' isoline (CH) popu- 
lation, and University of Utah 'Minsoy' x 'Noir V (MN) popu- 
lation, were previously described by Cregan et aL (1999a), The 
University of Utah vMinsoy' x 'Archer' (MA) and 'Archer' x 'Noir 
r (AN) RIL populations were described by Mansur et aL (1995, 
1996). Newly developed SSRs were mapped to MA, MN, and/or 
NA populations, and then JoinMap analysis was used on the five 
populations. 



L^NA, isozyme, and classical genetic markers 

A data set containing 1,019 SSR, 749 RFLP, 13 AFLP, 90 RAPD, 
ten isozyme, 24 classical, and 12 other markers that mapped in at 
least one of the five populations CH, MS, MA, MN, and/or AN was 
used for map integration. 



Statistical analysis 

Linkage map constmaion using JoinMap analysis 

Linkage maps of the five mapping populations were integrated 
based on the principle described by Stam (1993) using the JoinMap 
3,0 (Van Ooijen and Voorrips 2001) program. The initial step 
involved calculating the LOD scores and pairwise recombination 
frequencies between markers. A LOD of 5 0 was used to create 
linka^ie groups in the MS, MA. and AN populations, whereas a 
LOLM.o'was used in the MN and GH populations. The five maps of 
each linkage group were then integrated. Recombination values 
were converted to genetic distances using the Kosambi mapping 
function The resulting 20 linkage groups were identified using the 
alphanumeric codes described in Cregan et aL (1999a). 



SSR marker distrihulion 

The theoretical distribution of map distance between adjacent SSR 
markers was estimated based on the assumption of random 
distribution of markers over the total length of the linkage map. 
The goodness of fit between the observed and theoretical distribu- 
tion was tested using the Monte Carlo estimate of chi-square in 
Proc-StatXact 5 of SAS (Mehta and Patel 2002), The Monte Carlo 
estimate of the exact P-value was based on a Monte Carlo sample 
of size 10,000, To avoid the bias, markers developed from targeted 
isolation of BACs were excluded from this analysis. 



Results 

Development of SSR markers from EST 
and genomic DNA sequences 

Dinucleotide and trinucleotide SSRs were identified in 

EST and BAG end sequences from GenBank, BAG 
subclones, and from clones of genomic libraries. The 
minimum length criteria were ten or more repeat units for 
dinucleotide repeats and eight or more for trinucleotide 
repeats. A total of 420 new SSR loci were developed to 
add to the 606 SSR loci published by Gregan et al. 
(1999a). Among these 420 SSRs, 24 were developed from 
EST sequences, five from GenBank BAG end sequences, 
127 from DNA of BAG subclone libraries intended to 
target specific map positions, and 264 from genomic 
libraiies. Primer pairs designed for sequences with an 
ATT/TAA, AT/TA, GT/GA, and various other repeat 
motifs, numbered 1 10, 276, 12, and 22, respectively. 

Of the 136,800 soybean EST sequences examined, 75 
contained dinucleotide repeats of ten or more, and 58 
ESTs contained trinucleotide repeats of eight or more. 
The average percentage of ESTs containing the minimum 
number of repeats was thus less than 0.1%. Of the 133 
primer sets designed for the EST~derived SSRs, just 24 
(18,0%) amplified polymorphic products ainong the 
genotypes of *Minsoy', 'Noir 1\ and * Archer' (Table 1). 
In contrast, over the course of several years of SSR- 
marker development in soybean, 824 (43%) primer sets 
designed for SSRs derived from genomic libraries were 
polymorphic among these three genotypes. This propor- 
tion was significantly higher than the observed polymor- 
phism rate from the EST-derived primer sets (/=634, 
P<0.01). The mean length of di- and trinucleotide repeats 
was also significantly shorter (/=5.7, P<0.01 and /=9.3, 
P<0,01 for di- and trinucleotide repeats, respectively) in 
the EST-derived SSRs compared to the SSRs from 
genomic DNA sequences (Table 1). 



Table 1 Means and standard deviations (SD) of repeat numbers in simple sequence repeats (SSRs) obtained from either ESTs or from 
genomic DNA sequences 



Motif 


Primers designed to SSRs from EST sequences 


Primers desi; 


gned to SSRs from genomic DNA sequences 




No. of Mean(±SD) 
primer pairs repeat length 


No, of 

polymorphic loci 


No. of 
primer pairs 


Mean(±SD) 
repeat length 


No. of 

polymorphic loci 


Dinucleotide 
Trinucleotide 


75 18±6.7 
58 11±2,8 


14 (19%) 
10 (17%) 


693 
1,211 


24±7.4 
16±5-9 


283 (40%) 
541 (45%) 
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Table 2 Number of markers mapped to each linkage group and hnkage group length in Kosajiibi mapping distance 



Linkage 
group 


NO- b>b>R 

Previously mapped 


New 


INO, i\.ri_.r 

Previously mapped 


New 


RAPD 


No. 
AFLP 


Other 


Total 


Length 

(cM) 


Al 


27 


23 


36 


1 


- 


- 


- 


87 


102.3 


A2 


37 


27 


44 


2 


2 


- 


4 


1 16 


165J 


Bl 


19 


16 


32 


I 


2 


- 


1 


71 


131,8 


B2 


24 


12 


38 


4 


6 


2 


2 


88 


120.9 


CI 


21 


22 


19 


4 


- 




4 


70 


135.6 


C2 


35 


18 


41 


3 


2 




1 


100 


157.9 


Dla 


39 


14 


33 


4 


5 


- 


6 


101 


12L0 


Dlb 


30 


29 


18 


1 


1 


1 


2 


82 


138.0 


D2 


39 


21 


18 


1 


4 


1 


3 


87 


133.9 


E 


28 


15 


42 


5 


1 1 


- 


2 


103 


71>3 


F 


40 


24 


37 


4 


4 


1 


3 


113 


151.0 


G 


36 


27 


50 


3 


12 




1 


129 


] 16.8 


H 


21 


17 


34 


1 


J) 




o 
z 


C-r 


1 24.0 


I 


21 


19 


30 


2 


2 




2 


76 


125^2 


J 


22 


28 


31 


12 


5 






98 


91.0 


K 


40 


19 


22 


2 


4 


1 


4 


92 


117 0 


L 


31 


21 


41 


2 


2 




2 


99 


115.1 


M 


25 


26 


22 


2 


2 




I 


78 


142.2 


N 


24 


21 


25 


4 


4 




4 


82 


116.7 


0 


36 


21 


30 


2 


2 




2 


93 


146.4 


Total 


595 


420 


643 


66 


73 


6 


46 


1849 


2,523,6 



Targeted SSR marker development 

Of the 127 SSR markers developed from BAG subclone 
libraries, 91 originated from BAG clones identified by 
existing RFLP probes and 36 from BAG clones identified 
via existing SSR markers. However, only 36 of the 91 
(39.6%) compared to 23 of 36 (64%) markers subse- 
quently mapped to the genomic regions to which they 
were targeted. 



Mapping of the SSR markers 

A JoinMap analysis of the 1,019 SSR, 749 RFLP, 13 
AFLP, 90 RAPD, ten isozyme, and 30 other markers that 
segregated in at least one of the five populations produced 
a genetic map comprised of 20 consensus linkage groups 
that spanned 2,523.6 cM of Kosambi map distance. A 
total of 1,849 markers, including 1,015 SSRs, 709 RFLPs, 
73 RAPDs, 24 classical traits, six AFLPs, ten isozymes, 
and 12 others, were integrated to form the current map 
(Table 1), Four SSRs remained unlinked, as did 40 of the 
RFLP loci. Among the 1,849 markers, a total of 420 SSR 
and 66 RFLP have been added to the map since the report 
by Gregan et al. (1999a). The numbers of SSRs mapped 
per linkage group averaged about 51, but varied from 35 
to 64. The average length of the interval between any two 
adjacent SSR markers was 2.5 cM. The primer sequences 
for all SSR loci, as well as genetic maps of each of the 20 
consensus linkage groups, are available on the SoyBase 
Web site of the USDA, ARS Soybean Genome Database 
(http://soybase.agron.iastate,edu/). Additional details can 
be found on the corresponding author's Web site http:// 
bldg6.arsusda.gov/-pooley/soy /cregan/soy„map 1 .html. 



Distribution of SSR markers among 
and within linkage maps 

The MN map presented in Gregan et aL (1999a) had 36 
intervals of greater than 20 cM in which there was no SSR 
locus. With JoinMap integration and SSR markers from 
the other four populations, the gaps in the MN map were 
filled with 76 markers previously mapped by Gregan et al. 
(1999a) in the MS and GH populations. In the current 
study, 90 of the 420 new SSR loci developed either 
randomly or by targeting mapped to 30 of the 36 intervals. 
Six of the 36 intervals, G2 Satt202~Satt37 1 ; Dl a Satt53 1 - 
Satt368; Dlb Satt542-'Satt412; H Satt353-Sattl92; I top- 
Satt571 ; and O Sat„109-Scaa001 still contain no new SSR 
markers. The number of markers mapped to the remaining 
30 intervals varied from one to eight (Table 3). 

Ghi-square tests of the number of markers mapped to 
each linkage group indicated a significant deviation from 
that anticipated based upon linkage group length 
(%^=36.7, /^<0.05). However, this deviadon was mainly 
due to fewer and greater numbers of new SSRs mapping 
to linkage groups Bl and G (Fig. 1). Indeed, a recalcu- 
lation of the chi-square, with the G and Bl linkage groups 
excluded from the analysis, indicated similar SSR m!:u"ker 
density among the 18 remaining linkage groups. 

The randomness of SSR-marker distribution within 
linkage groups was also examined. Observed and theo- 
retical distributions of map distances between adjacent 
SSR markers were not completely congruent; the Monte 
Garlo estimate of the exact P-value based on a Monte 
Garlo sample of size 10,000 is less than 0.01. As 
indicated in Fig. 2, there were large differences in the 
observed and expected frequencies of the cases in which 
adjacent SSR markers were separated by 0.5 cM or by 
1.0 cM. The observed and the expected numbers were 
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Table 3 Number of new SSR loci mapped to genomic intervals of 
at least 20 cM that contLiined no SSR markers in the soybean 
genetic map ieported by Cregan et aL (1999a) 



Linkage 


Flanking SSR 


No, of previously 


No- of 


group 


loci 01 


existing markers 


new 


linkage-group 


positioned via 


markers 




end 


JoinMap analysis 




A! 


Satt050-Satt385 


0 


4 




Satt424-Sat 115 


2 


1 


Bl 


Top-Satt509 


I 


4 




Sattl97-Satt298 


2 


2 




Sat 123^Sat.t453 


2 


1 


B2 


Satt577-Sattl26 


2 


2 




Sattl26-SctJ)34 


0 


2 




Satt534-Satt560 


2 


I 


CI 


Soygpatr-Satt578 


I 


4 




Sat 042-Satt524 


1 


6 


C2 


Sat 13()-Sat_062 


4 


3 




Satt29i-Sattl7() 


4 


2 




Satt2()2-Satt371 


I 


0 


Dla 


Satt531'Satt368 


0 


0 




Sat 036-Satt071 


2 


2 


Dlb 


Sat^09r>Satt09.5 


1 


3 




Satt542-Satt4 1 2 


3 


0 




SatJ)69'Satt459 


2 


I 


D2 


Satt30NSatJ)86 


5 


2 


E 


Satt384-Satt598 


9 


7 


F 


Satt522-Sat^074 


1 


3 


G 


Satt288-Satt472 


0 


3 


H 


Satt353-Satll92 


3 


0 


I 


Top-Satt57 1 


0 


0 


J 


Sct()46-Satt456 


6 


8 




Satt2i5>Satt244 


3 


3 


K 


Sat 043-Satt475 


1 


1 




Satt260-Sat_020 


0 


4 


L 


Satt462-Satt481 


0 


4 


M 


Sattl50-Satt.567 


0 


I 

3 


N 


Top-Sattl59 


1 




Satt387-Satt52l 


1 


1 


0 


Satt445-Satt259 


0 


5 




Satt347-Satt262 


10 


5 




Sattl23-Satt243 


6 


2 




Sar„109-Scaa001 


0 


0 




■ Observed 
□ Theoretical 



Linkage group 



Fig, 1 Observed and theoretical distribution of simple sequence 
repeat (SSR) markers in linkage groups based on the ratio of 
mapped SSR markers to linkage group length (cM) 



187 versus 133 and 148 versus 113, respectively. These 
data indicated that more SSRs than average are closely 
linked, thus suggesting some degree of SSR-marker 
clustering. 



200 I 
180 - 
160 - 




0 5 JO 3 5 HO {0 5 iiO 15 5 ISO 20 5 23 0 25 5 

Distance 

Fig. 2 Theoretical and observed distnbution of Kosambi map 
distance between adjacent SSR markers (summarized over all 
linkage groups) 



Discussion 

We designed primers to 133 sequences with niicrosatellite 
repeats derived from ESTs, but only 24 (18,0%) of those 
primer sets produced useful polymorphic markers. In 
contrast, when genomic DNA sequences were used as the 
source of SSR-containing sequences, 43.0% yielded 
markers that were polymorphic with respect to the 
genotypes of VMinsoy', 'Noir T, and 'Archer\ Markers 
derived from genomic libraries also contained more 
repeat units as well as a greater range of allele sizes 
and genetic diversity than markers isolated from EST 
libraries. The striking difference of polymorphism be- 
tween the soybean SSRs derived Ixom the two sources is 
consistent with differences reported in rice (Temnykh et 
al. 1999; Cho et al. 2000), sugarcane (Cordeiro et al. 
2001), tomato (Arshchenkova and Ganal 2002), wheat 
(Eujayl et al. 2002), and barley (Thiel et al 2003), For 
example, Arshchenkova and Ganal (2002) reported that 
only 20 of 27,000 tomato ESTs contained microsatellites 
of more than ten repeat units. EST-derived microsatellites 
were generally shoiter (7.3 repeat units) than genomic 
DNA-derived microsatellites (22.7 repeat units) in bariey 
(Ramsay et al. 2000). The average number of repeats 
from EST-derived and genomic DNA-derived SSRs was 
6.1 versus 13.7 in sugarcane (Cordeiro et al, 2001). The 
expansion or contraction of dinucleotide repeat length in 
exons may likely be suppressed due to the deterious 
nature of the frame-shift mutation that would frequently 
result in translated regions. Microsatellite markers de- 
rived from repeat arrays in genes are reported to be 
significantly less polymorphic than markers generated 
from longer arrays (Smulders et al. 1997). Other factors 
such as selection against large alteration in coding DNA 
or even a closely associated sequence that may play a role 
in gene expression could constrain microsatellite expan- 
sion or contraction. Such constraints could contribute to 
the reduced polymorphism of microsatellites in ESTs, To 
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Table 4 Position of repeat motif in the sequenced genes from which polymorphic SSR markers were developed 

GenBank Description Motif Repeat position 

accession 

number 



AB002807 Glycine nuix DNA for modulin 35 

AF 1 62283 Glycine max acetyl-CoA carboxylase {accB-J) gene 

AF186183 Glycine max retrovirus- like element Calypso2-] 

X534()4 Glycine max glycin A (la)B(lb) and A(2)B(la) boundary DNA 

X 1 7 1 20 Soybean actin SAc7 gene 

X 1 6876 Soybean EN0D2B gene 

X()1425 Soybean pseudogene for leghemoglobin 

X07159 Soybean pseudogene for heat shock protein Gmshpl7.9-D (class VI) 

V()0458 Glycine max gene encoding ribulose-l ,5-bisphosphate carboxylase 

small subunit 

X56139 Soybean ac5l4 gene for lipoxygenase 

L23833 Soybean glutamine phosphoribosyl pyrophosphate amidotransferase 

Ml 1317 Soybean (Glycine nuu) low MVV heat shock protein gene 

(Gmhspl7.6-L) 

V()0452 Glycine max leghemoglobin gene 

M94764 Glycine max nodulin gene 

J()2746 Glycine max ShPRPl gene encoding a proline-rich protein 



(AT) 14 


Boundary 5' upstream sequences 


(CT) 1 1 


5' Untranslated region 


(ATT)22 


Boundary 5' upstream sequences 


(AT)25 


Intragenic 


(CT)16 


Boundary 5' upstream sequences 


(AT) 17 


Intragenic 


A18 


Intron 


(AT)9 


Boundary 5' upstream sequences 


A20 


Intron 


(AT)) 3 


Boundary 5' upstream sequences 


(CTT)6(CTT)4 


5^ Untranslated region 


(AT) 15 


Boundary 5' upstream sequences 


(AT)26 


intron 


(AT)24 


Intron 


(ATT)20 


Boundary 5' upstream sequences 



gain further understanding of the position of SSRs in and 

around functioning genes, the position of SSRs in the 15 
genes iroin which we have developed polymorphic 
markers was determined (Table 4). in two instances, the 
SSR was located in 5' UTR sequence, while in all others, 
they were located in either 5' boundary sequence (seven 
cases), introns (four cases), or in intragenic sequence (two 
cases). This suggested that even when polymorphic SSRs 
were discovered in genie or perigenic regions, the SSR- 
repeat sequence changes occur only infrequently in 
mRNA„ Obviously, EST-sequence data provide a conve- 
nient source of SSR-containing sequences that may be 
easily and inexpensively exploited. However, even in 
species with hirge EST collections, relatively few infor- 
mative SSR loci are likely to result from this source. 

Clustering of SSR markers on the soybean map was 
observed. Similar clustering of SSR markers was also 
reported in the tomato (Broun and Tanksley 1996; 
Areshchenkova and Ganal 1999) and rice linkage maps 
(McCouch et aL 200.3). Physical clustering of SSR 
markers was also reported in the rat radiation hybrid 
map (Watanabe et al 1999) and in barley (Cardie et aL 
2000). Morgante et al. (2002) and Cardie et al. (2000) 
indicated that microsatellites are significantly associated 
with the low-copy fraction of plant genomes based on the 
estimation of microsatellite density in Arabidopsis tliaih 
ana, rice, soybean, maize, and wheat. Among these 
species, the overall frequency of microsatellites was 
inversely related to genome size and to the proportion of 
repetitive DNA. Tliis suggests that most microsatellites 
reside in regions predating the recent genome expansions 
in many plants. In order to investigate the distribudon of 
SSRs per megabase (Mb) on each of the 12 rice chromo- 
somes, McCouch et aL (2003) divided the total number of 
SSRs mapped to each chromosome by the total length of 
genomic sequence available for each. The figures were 
compared to the number of EST clusters/Mb on each 
chromosome identified by The Institute for Genomic 



Research's Oryzxi gene index using the same genomic 
sequence data. The density of genes was approximately 
ten times the density of newly developed SSR markers, 
but there was a significant conelation (/-0,45, P<0.015) 
between the number of genes/Mb and the number SSRs/ 
Mb at the level of the chromosome. The clustering of SSR 
loci we have observed in this report may correspond to 
gene-rich regions of soybean. However, as a result of the 
shorter length criteria used to define a microsatellite in 
studies such as Morgante et aL (2002) versus that used 
here, this conclusion remains tentative. 

Thirty of the 36 intervals in the MN population 
described by Cregan et aL (1999a) that then contained no 
SSR marker now have at least one, and often several, SSR 
markers based on the results of our present study. In many 
instances, new markers were positioned in these intervals 
as a result of SSRs obtained from genomic clones, but 127 
were developed from BAC clones with the intention of 
targeting specific genomic intervals. The proportion of 
SSRs that mapped to the linkage groups to which they 
were targeted was much higher (64%) when BAC clones 
were idenufied using PCR primers to SSR-tlanking 
regions than when the BAC clones were identified with 
RFLP probes (39.6%). The higher efficiency of targeting 
when using SSR rather than RFLP probes is likely due to 
the greater specificity of PCR versus hybridizadon. The 
difference is also consistent with the report that RFLP 
probes hybridize, on average, to 2.55 map positions in the 
soybean genome (Shoemaker et al. 1996), and that the 
multiple fragments detected by RFLP frequently occur on 
different linkage groups (Keim et aL 1990). If only 1 of 
every 2.55 hybridizations per probe were to the targeted 
posidon in the genome, then approximately 39% of the 
BACs identified would be from the desired position in the 
genome. Thus, our results suggest that hybridizadon was 
about as successful as would be anticipated for the 
identification of a BAC clone from a specific posidon in 
the soybean genome. Although targeting was more 
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successful when SSRs were used to identify BAC clones, 
the duplicated nature of the genome still interfered with 
the efficiency of BAC clone identification despite the 
greater specificity of PGR. This is Hkely a reflection of 
the fact that at least some regions of the soybean genome 
share very high levels of sequence homology (Zhu et aL 
1994; Shoemaker et aL 1996). This may niiike the 
development of locus-specific markers to these duplicated 
regions extremely difficult. 
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